Skip to content

Conversation

anujj
Copy link
Contributor

@anujj anujj commented Sep 24, 2025

  • Keep NvTensorRtRtx as user facing name.
  • Automatically set extra_options["use_qdq"]=True when using TRT-RTX to
    ensure Q/DQ quantization path is enabled by default.

@anujj
Copy link
Contributor Author

anujj commented Sep 24, 2025

@anujj anujj changed the title Add TRT-RTX EP support, normalize alias, and force QDQ Add TRT-RTX EP support, keep NvTensorRtRtx as user facing name, and force QDQ Sep 25, 2025
@anujj
Copy link
Contributor Author

anujj commented Sep 25, 2025

Updated PR to keep user facing name as NvTensorRtRtx


if __name__ == '__main__':
args = get_args()
if args.execution_provider == "NvTensorRtRtx":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should handle this logic inside create_model so that users who import create_model can leverage this (e.g. Olive).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am forcing use_qdq in check_extra_options() with is called before create_model

# force use_qdq for trt-rtx
if args.execution_provider == "trt-rtx":
    kv_pairs["use_qdq"] = True

i had to switch the condition to if args.execution_provider == "NvtensorRtRtx":, which should be okey i guess ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a separate issue from what you are mentioning. This check to change the EP name from NvTensorRtRtx to trt-rtx will not happen for users who import create_model. Here is an example which will fail with your current PR changes.

from onnxruntime_genai.models.builder import create_model

input_dir = ""
output_dir = "./phi-4-mini"
cache_dir = "./cache_dir"

model_name = "microsoft/Phi-4-mini-instruct"
precision = "int4"
ep = "NvTensorRtRtx"

extra_options = {
    "use_qdq": "true"
}

create_model(model_name, input_dir, output_dir, precision, ep, cache_dir, **extra_options)

In the above code, create_model will not replace NvTensorRtRtx with trt-rtx. This will cause the incorrect ONNX model to get created. After moving the replacement of the EP name inside create_model, the above code should work.


# force use_qdq for trt-rtx
if args.execution_provider == "trt-rtx":
kv_pairs["use_qdq"] = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the original implementation of this before it was removed, there was a comment which said that QDQ was used because opset 21 was desired. The default opset is now 21 but it seems that creating a QDQ model is the true intention.

I also remember there was an issue in Olive which originated from forcing use_qdq here. Could there be a future scenario where creating a TRT-RTX model without QDQ is desired?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dont support matmulnbit in trt-rtx that's why it is always required, without this int4 path will break.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will MatMulNBits be supported in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No plan for that kunal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplify the changes in the latest commit

@gaugarg-nv
Copy link
Contributor

@anujj we will also have to revert this PR in Olive: microsoft/Olive#2169

@anujj
Copy link
Contributor Author

anujj commented Sep 25, 2025

Raised a PR for that : microsoft/Olive#2182

xiaoyu-work pushed a commit to microsoft/Olive that referenced this pull request Sep 26, 2025
As per the discussion with the GenAI team, we are moving the the
NvtensorRtRtx user facing name

GenAI PR: microsoft/onnxruntime-genai#1791
raise ValueError("Both 'exclude_lm_head' and 'include_hidden_states' cannot be used together. Please use only one of them at once.")



Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change


@torch.no_grad
def create_model(model_name, input_path, output_dir, precision, execution_provider, cache_dir, **extra_options):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants