-
Notifications
You must be signed in to change notification settings - Fork 219
[Builder] Add support for Olive quantized models #1647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can we uncomment some of the CI models to test the quantized PyTorch to quantized ONNX path? onnxruntime-genai/test/python/_test_utils.py Lines 63 to 95 in f9a57f5
|
I can do it as part of this PR but not sure which ones to uncomment. |
You can uncomment the models with |
"olive"
quant typek_quant
mixed precisionint4_algo
, select matmuls can be in 8 bits.q_proj
,k_proj
,v_proj
matmuls use the same configuration (bits and group_size) so that they can be merged without issues.quant_weight
anddequant_weight
support nog_idx
by usingrepeat_interleave
. Otherwise, we have to create a trivial g_idx like the quark model does.pack_ort_format
supports 8 bit packing.