[Builder] Add support for Olive quantized models #1647

jambayk · 2025-07-22T22:37:58Z

Support new "olive" quant type
Weight and zero-point packings are the same as gptq. No g_idx.
Similar to the k_quant mixed precision int4_algo, select matmuls can be in 8 bits.
- Currently, we ensure that the q_proj, k_proj, v_proj matmuls use the same configuration (bits and group_size) so that they can be merged without issues.
The modules are generalized to remove the requirement that all matmuls in a layer must have the same bits and group_size.
quant_weight and dequant_weight support no g_idx by using repeat_interleave. Otherwise, we have to create a trivial g_idx like the quark model does.
pack_ort_format supports 8 bit packing.

nit

src/python/py/models/builder.py

src/python/py/models/quantized_model.py

kunal-vaishnavi · 2025-07-28T08:19:28Z

Can we uncomment some of the CI models to test the quantized PyTorch to quantized ONNX path?

onnxruntime-genai/test/python/_test_utils.py

Lines 63 to 95 in f9a57f5

    
           def get_model_paths(): 
        
               # TODO: Uncomment the following models as needed in the CI pipeline. 
        
               hf_paths = { 
        
                   "phi-2": "microsoft/phi-2", 
        
                   # "olmo": "amd/AMD-OLMo-1B-SFT-DPO", 
        
                   "qwen-2.5": "Qwen/Qwen2.5-0.5B", 
        
                   # "phi-3.5": "microsoft/Phi-3.5-mini-instruct", 
        
                   # "llama-3.2": "meta-llama/Llama-3.2-1B-instruct", 
        
                   # "granite-3.0": "ibm-granite/granite-3.0-2b-instruct", 
        
               } 
        
               ci_data_path = os.path.join(get_ci_data_path(), "pytorch") 
        
               if not os.path.exists(ci_data_path): 
        
                   return {}, hf_paths 
        
               # Note: If a model has over 4B parameters, please add a quantized version 
        
               # to `ci_paths` instead of `hf_paths` to reduce file size and testing time. 
        
               ci_paths = { 
        
                   # "llama-2": os.path.join(ci_data_path, "Llama-2-7B-Chat-GPTQ"), 
        
                   # "llama-3": os.path.join(ci_data_path, "Meta-Llama-3-8B-AWQ"), 
        
                   # "mistral-v0.2": os.path.join(ci_data_path, "Mistral-7B-Instruct-v0.2-GPTQ"), 
        
                   "phi-2": os.path.join(ci_data_path, "phi2"), 
        
                   # "gemma-2b": os.path.join(ci_data_path, "gemma-1.1-2b-it"), 
        
                   # "gemma-7b": os.path.join(ci_data_path, "gemma-7b-it-awq"), 
        
                   # "phi-3-mini": os.path.join(ci_data_path, "phi3-mini-128k-instruct"), 
        
                   # "gemma-2-2b": os.path.join(ci_data_path, "gemma-2-2b-it"), 
        
                   # "llama-3.2": os.path.join(ci_data_path, "llama-3.2b-1b-instruct"), 
        
                   "qwen-2.5": os.path.join(ci_data_path, "qwen2.5-0.5b-instruct"), 
        
                   # "nemotron-mini": os.path.join(ci_data_path, "nemotron-mini-4b"), 
        
               } 
        
               return ci_paths, hf_paths

src/python/py/models/quantized_model.py

jambayk · 2025-07-28T19:15:47Z

Can we uncomment some of the CI models to test the quantized PyTorch to quantized ONNX path?

onnxruntime-genai/test/python/_test_utils.py

Lines 63 to 95 in f9a57f5

def get_model_paths():

# TODO: Uncomment the following models as needed in the CI pipeline.

hf_paths = {

"phi-2": "microsoft/phi-2",

# "olmo": "amd/AMD-OLMo-1B-SFT-DPO",

"qwen-2.5": "Qwen/Qwen2.5-0.5B",

# "phi-3.5": "microsoft/Phi-3.5-mini-instruct",

# "llama-3.2": "meta-llama/Llama-3.2-1B-instruct",

# "granite-3.0": "ibm-granite/granite-3.0-2b-instruct",

}

ci_data_path = os.path.join(get_ci_data_path(), "pytorch")

if not os.path.exists(ci_data_path):

return {}, hf_paths

# Note: If a model has over 4B parameters, please add a quantized version

# to `ci_paths` instead of `hf_paths` to reduce file size and testing time.

ci_paths = {

# "llama-2": os.path.join(ci_data_path, "Llama-2-7B-Chat-GPTQ"),

# "llama-3": os.path.join(ci_data_path, "Meta-Llama-3-8B-AWQ"),

# "mistral-v0.2": os.path.join(ci_data_path, "Mistral-7B-Instruct-v0.2-GPTQ"),

"phi-2": os.path.join(ci_data_path, "phi2"),

# "gemma-2b": os.path.join(ci_data_path, "gemma-1.1-2b-it"),

# "gemma-7b": os.path.join(ci_data_path, "gemma-7b-it-awq"),

# "phi-3-mini": os.path.join(ci_data_path, "phi3-mini-128k-instruct"),

# "gemma-2-2b": os.path.join(ci_data_path, "gemma-2-2b-it"),

# "llama-3.2": os.path.join(ci_data_path, "llama-3.2b-1b-instruct"),

"qwen-2.5": os.path.join(ci_data_path, "qwen2.5-0.5b-instruct"),

# "nemotron-mini": os.path.join(ci_data_path, "nemotron-mini-4b"),

}

return ci_paths, hf_paths

I can do it as part of this PR but not sure which ones to uncomment.

kunal-vaishnavi · 2025-07-28T20:38:06Z

Can we uncomment some of the CI models to test the quantized PyTorch to quantized ONNX path?

onnxruntime-genai/test/python/_test_utils.py

Lines 63 to 95 in f9a57f5

def get_model_paths():

# TODO: Uncomment the following models as needed in the CI pipeline.

hf_paths = {

"phi-2": "microsoft/phi-2",

# "olmo": "amd/AMD-OLMo-1B-SFT-DPO",

"qwen-2.5": "Qwen/Qwen2.5-0.5B",

# "phi-3.5": "microsoft/Phi-3.5-mini-instruct",

# "llama-3.2": "meta-llama/Llama-3.2-1B-instruct",

# "granite-3.0": "ibm-granite/granite-3.0-2b-instruct",

}

ci_data_path = os.path.join(get_ci_data_path(), "pytorch")

if not os.path.exists(ci_data_path):

return {}, hf_paths

# Note: If a model has over 4B parameters, please add a quantized version

# to `ci_paths` instead of `hf_paths` to reduce file size and testing time.

ci_paths = {

# "llama-2": os.path.join(ci_data_path, "Llama-2-7B-Chat-GPTQ"),

# "llama-3": os.path.join(ci_data_path, "Meta-Llama-3-8B-AWQ"),

# "mistral-v0.2": os.path.join(ci_data_path, "Mistral-7B-Instruct-v0.2-GPTQ"),

"phi-2": os.path.join(ci_data_path, "phi2"),

# "gemma-2b": os.path.join(ci_data_path, "gemma-1.1-2b-it"),

# "gemma-7b": os.path.join(ci_data_path, "gemma-7b-it-awq"),

# "phi-3-mini": os.path.join(ci_data_path, "phi3-mini-128k-instruct"),

# "gemma-2-2b": os.path.join(ci_data_path, "gemma-2-2b-it"),

# "llama-3.2": os.path.join(ci_data_path, "llama-3.2b-1b-instruct"),

"qwen-2.5": os.path.join(ci_data_path, "qwen2.5-0.5b-instruct"),

# "nemotron-mini": os.path.join(ci_data_path, "nemotron-mini-4b"),

}

return ci_paths, hf_paths

I can do it as part of this PR but not sure which ones to uncomment.

You can uncomment the models with GPTQ or AWQ in the name since they will go through quantized_model.py.

jambayk added 5 commits July 22, 2025 19:54

load olive quant models

39e45a2

support 8 bit packing

a0f08fe

use repeat interleave instead

3a7413a

per-channel fix

d0e2948

nit

not implemented error for 8 bits qdq

7d52e66

jambayk requested a review from kunal-vaishnavi July 22, 2025 22:38

natke added the 0.9.0 label Jul 24, 2025

justinchuby reviewed Jul 25, 2025

View reviewed changes

src/python/py/models/builder.py Show resolved Hide resolved

kunal-vaishnavi reviewed Jul 28, 2025

View reviewed changes

src/python/py/models/quantized_model.py Show resolved Hide resolved

kunal-vaishnavi reviewed Jul 28, 2025

View reviewed changes

src/python/py/models/quantized_model.py Outdated Show resolved Hide resolved

kunal-vaishnavi reviewed Jul 28, 2025

View reviewed changes

src/python/py/models/quantized_model.py Show resolved Hide resolved

kunal-vaishnavi reviewed Jul 28, 2025

View reviewed changes

src/python/py/models/quantized_model.py Show resolved Hide resolved

jambayk requested a review from kunal-vaishnavi July 28, 2025 19:15

jambayk added 4 commits July 28, 2025 21:48

support 2 bit ort packing

68660c2

uncomment gptq and awq ci models

74b899d

comment out mistral

d2bb51d

comment models again

6603235

kunal-vaishnavi approved these changes Jul 30, 2025

View reviewed changes

kunal-vaishnavi enabled auto-merge (squash) July 30, 2025 17:22

kunal-vaishnavi merged commit 4aee929 into main Jul 30, 2025
14 of 16 checks passed

kunal-vaishnavi deleted the jambayk/olive-quant branch July 30, 2025 17:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Builder] Add support for Olive quantized models #1647

[Builder] Add support for Olive quantized models #1647

Uh oh!

jambayk commented Jul 22, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kunal-vaishnavi commented Jul 28, 2025

Uh oh!

Uh oh!

Uh oh!

jambayk commented Jul 28, 2025

Uh oh!

kunal-vaishnavi commented Jul 28, 2025

Uh oh!

Uh oh!

Uh oh!

[Builder] Add support for Olive quantized models #1647

[Builder] Add support for Olive quantized models #1647

Uh oh!

Conversation

jambayk commented Jul 22, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kunal-vaishnavi commented Jul 28, 2025

Uh oh!

Uh oh!

Uh oh!

jambayk commented Jul 28, 2025

Uh oh!

kunal-vaishnavi commented Jul 28, 2025

Uh oh!

Uh oh!

Uh oh!