-
Notifications
You must be signed in to change notification settings - Fork 3.5k
rel-1.22.2 cherry-pick 1 #25633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
rel-1.22.2 cherry-pick 1 #25633
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add support to Upsample operator to op builder in QNN-EP. - Enhance QNN-EP support for Upsample operator. - Add unit test for Upsample operator in QNN-EP.
[QNN EP] Add Einsum support for some equations. Intend is not to support all equations. But to enable case by case to improve performance.
…#24640) ### Description enable use_vcpkg for QNN Nuget package build and Python arm64ec build
NFC, move file location only.
### Description Add LSTM support for QNN EP ### Motivation and Context Add LSTM support for QNN EP
### Description Update Qnn default version to 2.34.0.250424
QNN [Softmax op defines pre-scale (`beta`)](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/MasterOpDef.html#softmax) that we can fold constant scalar multiply into it.
### Description Remove ep_weight_sharing_ctx_gen tool from QNN EP python wheel
- Registered CumSum op in QNN EP - Added unit test to verify accuracy and assignment of op to QNN EP ### Description Added support for CumSum in QNN EP ### Motivation and Context There is no support for CumSum op in QNN EP
### Description Update Qnn default version to 2.35.0.250530
* Re-enable tests and remove workarounds that were introduced as part of a QNN <= 2.31 upgrade but are no longer necessary. QNN/QAIRT releases about once a month. As ONNX Runtime adopts these new versions, some number of tests are often found to be impacted. Consequently, tests are skipped and tolerances are loosened. This change reverts as many of those workarounds as possible that were made for QNN upgrades between 2.17 and 2.31, inclusive. The most recent few releases were intentionally not examined to minimize impact on users on old versions and to avoid lock-in to the bleeding edge. --------- Co-authored-by: Jeff Kilpatrick <[email protected]>
### Description Update Qnn default version to 2.36.0.250627
### Description - Adds multithreaded vectorized implementations of DequantizeLinear for int8 and uint8 inputs: - Intel SSE 2 - ARM NEON - All other architectures fallback to a multithreaded scalar reference implementation (previous was not multithreaded). - **Note**: only enabled if ORT is built for client/on-device workloads (`ORT_CLIENT_PACKAGE_BUILD` is defined). INT8 DequantizeLinear latency on Intel Core i9-10920X with 4 intra op threads (SSE 2 implementation) | Number of elements | Baseline latency (us) | Multithreaded+SIMD latency (us) | Speedup | | ----------------------- | ---------------------- | ------------------------------------ | ---------- | | 10 K | 1 | 1 | 1 | | 20 K | 2 | 2 | 1 | | 40 K | 5 | 5 | 1 | | 80 K | 11 | 4 | 2.75 | | 100 K | 14 | 5 | 2.80 | | 150 K | 21 | 7 | 3.00 | | 200 K | 28 | 8 | 3.50 | | 400 K | 68 | 15 | 4.53 | | 600 K | 107 | 21 | 5.10 | | 800 K | 142 | 28 | 5.07 | | 1 M | 187 | 42 | 4.45 | | 2 M | 376 | 102 | 3.69 | | 4 M | 880 | 236 | 3.73 | | 6 M | 1547 | 557 | 2.78 | | 8 M | 2438 | 1097 | 2.22 | | 10 M | 3192 | 1464 | 2.18 | | 100 M | 38718 | 17733 | 2.18 | INT8 DequantizeLinear latency on Snapdragon 8cx gen 3 @ 3.4GHz with 4 intra op threads (NEON implementation) | Number of elements | Baseline latency (us) | Multithreaded+SIMD latency (us) | Speedup | | ----------------------- | ---------------------- | ------------------------------------ | ---------- | | 10 K | 1 | 1 | 1 | | 20 K | 1 | 1 | 1 | | 40 K | 3 | 3 | 1 | | 80 K | 7 | 4 | 1.75 | | 100 K | 9 | 3 | 3.00 | | 150 K | 14 | 5 | 2.80 | | 200 K | 18 | 6 | 3.00 | | 400 K | 38 | 10 | 3.80 | | 600 K | 61 | 15 | 4.07 | | 800 K | 76 | 19 | 4.00 | | 1 M | 98 | 24 | 4.08 | | 2 M | 204 | 48 | 4.25 | | 4 M | 424 | 112 | 3.79 | | 6 M | 677 | 384 | 1.76 | | 8 M | 919 | 621 | 1.48 | | 10 M | 1132 | 776 | 1.46 | | 100 M | 11842 | 10566 | 1.12 | ### Motivation and Context Improves latency of quantized QDQ models that with large DQs that dominate the inference latency.
add a build option to enable default options more appropriate for client/on-device workloads. initial use case will be to set the default thread pool allow_spinning policy , which we want to default to 0/false for builds targeted for client/on-device workloads. --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
### Description Enable DSP queue polling when performance profile is burst
### Description Update Qnn default version to 2.36.1.250708 Co-authored-by: Jeff Kilpatrick <[email protected]>
Previously the machine pool had a User-assigned managed identity (UMI) which was used for accessing the blob storage. Now the UMI was removed. to improve security. Therefore we baked the data into the VM image instead.
### Description Use the license file from QNN SDK to make sure it's up to date. --------- Co-authored-by: adrianlizarraga <[email protected]>
Please also cherry-pick #25523 , because the old machine pool Onnxruntime-github-Linux-GPU-A100-WUS3 is gone. |
To include the following change: dmlc/dlpack#165
…4802) ### Description <!-- Describe your changes. --> Currently some required ADO pipeline fails because of version mismatch between vcpkg build and non vcpkg build. This PR fixes the failed builds. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
jywu-msft
approved these changes
Aug 9, 2025
snnn
approved these changes
Aug 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Motivation and Context