-
Notifications
You must be signed in to change notification settings - Fork 219
Add TRT-RTX EP support, keep NvTensorRtRtx as user facing name, and force QDQ #1791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Updated PR to keep user facing name as NvTensorRtRtx |
src/python/py/models/builder.py
Outdated
|
||
if __name__ == '__main__': | ||
args = get_args() | ||
if args.execution_provider == "NvTensorRtRtx": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should handle this logic inside create_model
so that users who import create_model
can leverage this (e.g. Olive).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am forcing use_qdq in check_extra_options() with is called before create_model
# force use_qdq for trt-rtx
if args.execution_provider == "trt-rtx":
kv_pairs["use_qdq"] = True
i had to switch the condition to if args.execution_provider == "NvtensorRtRtx":
, which should be okey i guess ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a separate issue from what you are mentioning. This check to change the EP name from NvTensorRtRtx
to trt-rtx
will not happen for users who import create_model
. Here is an example which will fail with your current PR changes.
from onnxruntime_genai.models.builder import create_model
input_dir = ""
output_dir = "./phi-4-mini"
cache_dir = "./cache_dir"
model_name = "microsoft/Phi-4-mini-instruct"
precision = "int4"
ep = "NvTensorRtRtx"
extra_options = {
"use_qdq": "true"
}
create_model(model_name, input_dir, output_dir, precision, ep, cache_dir, **extra_options)
In the above code, create_model
will not replace NvTensorRtRtx
with trt-rtx
. This will cause the incorrect ONNX model to get created. After moving the replacement of the EP name inside create_model
, the above code should work.
src/python/py/models/builder.py
Outdated
|
||
# force use_qdq for trt-rtx | ||
if args.execution_provider == "trt-rtx": | ||
kv_pairs["use_qdq"] = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the original implementation of this before it was removed, there was a comment which said that QDQ was used because opset 21 was desired. The default opset is now 21 but it seems that creating a QDQ model is the true intention.
I also remember there was an issue in Olive which originated from forcing use_qdq
here. Could there be a future scenario where creating a TRT-RTX model without QDQ is desired?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we dont support matmulnbit in trt-rtx that's why it is always required, without this int4 path will break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will MatMulNBits
be supported in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No plan for that kunal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simplify the changes in the latest commit
@anujj we will also have to revert this PR in Olive: microsoft/Olive#2169 |
Raised a PR for that : microsoft/Olive#2182 |
As per the discussion with the GenAI team, we are moving the the NvtensorRtRtx user facing name GenAI PR: microsoft/onnxruntime-genai#1791
raise ValueError("Both 'exclude_lm_head' and 'include_hidden_states' cannot be used together. Please use only one of them at once.") | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
@torch.no_grad | ||
def create_model(model_name, input_path, output_dir, precision, execution_provider, cache_dir, **extra_options): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ensure Q/DQ quantization path is enabled by default.