rel-1.22.2 cherry-pick 1 #25633

adrianlizarraga · 2025-08-01T15:56:22Z

Description

Motivation and Context

Add support to Upsample operator to op builder in QNN-EP. - Enhance QNN-EP support for Upsample operator. - Add unit test for Upsample operator in QNN-EP.

[QNN EP] Add Einsum support for some equations. Intend is not to support all equations. But to enable case by case to improve performance.

…#24640) ### Description enable use_vcpkg for QNN Nuget package build and Python arm64ec build

NFC, move file location only.

### Description Add LSTM support for QNN EP ### Motivation and Context Add LSTM support for QNN EP

### Description Update Qnn default version to 2.34.0.250424

QNN [Softmax op defines pre-scale (`beta`)](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/MasterOpDef.html#softmax) that we can fold constant scalar multiply into it.

### Description Remove ep_weight_sharing_ctx_gen tool from QNN EP python wheel

- Registered CumSum op in QNN EP - Added unit test to verify accuracy and assignment of op to QNN EP ### Description Added support for CumSum in QNN EP ### Motivation and Context There is no support for CumSum op in QNN EP

### Description Update Qnn default version to 2.35.0.250530

* Re-enable tests and remove workarounds that were introduced as part of a QNN <= 2.31 upgrade but are no longer necessary. QNN/QAIRT releases about once a month. As ONNX Runtime adopts these new versions, some number of tests are often found to be impacted. Consequently, tests are skipped and tolerances are loosened. This change reverts as many of those workarounds as possible that were made for QNN upgrades between 2.17 and 2.31, inclusive. The most recent few releases were intentionally not examined to minimize impact on users on old versions and to avoid lock-in to the bleeding edge. --------- Co-authored-by: Jeff Kilpatrick <[email protected]>

### Description Update Qnn default version to 2.36.0.250627

### Description - Adds multithreaded vectorized implementations of DequantizeLinear for int8 and uint8 inputs: - Intel SSE 2 - ARM NEON - All other architectures fallback to a multithreaded scalar reference implementation (previous was not multithreaded). - **Note**: only enabled if ORT is built for client/on-device workloads (`ORT_CLIENT_PACKAGE_BUILD` is defined). INT8 DequantizeLinear latency on Intel Core i9-10920X with 4 intra op threads (SSE 2 implementation) | Number of elements | Baseline latency (us) | Multithreaded+SIMD latency (us) | Speedup | | ----------------------- | ---------------------- | ------------------------------------ | ---------- | | 10 K | 1 | 1 | 1 | | 20 K | 2 | 2 | 1 | | 40 K | 5 | 5 | 1 | | 80 K | 11 | 4 | 2.75 | | 100 K | 14 | 5 | 2.80 | | 150 K | 21 | 7 | 3.00 | | 200 K | 28 | 8 | 3.50 | | 400 K | 68 | 15 | 4.53 | | 600 K | 107 | 21 | 5.10 | | 800 K | 142 | 28 | 5.07 | | 1 M | 187 | 42 | 4.45 | | 2 M | 376 | 102 | 3.69 | | 4 M | 880 | 236 | 3.73 | | 6 M | 1547 | 557 | 2.78 | | 8 M | 2438 | 1097 | 2.22 | | 10 M | 3192 | 1464 | 2.18 | | 100 M | 38718 | 17733 | 2.18 | INT8 DequantizeLinear latency on Snapdragon 8cx gen 3 @ 3.4GHz with 4 intra op threads (NEON implementation) | Number of elements | Baseline latency (us) | Multithreaded+SIMD latency (us) | Speedup | | ----------------------- | ---------------------- | ------------------------------------ | ---------- | | 10 K | 1 | 1 | 1 | | 20 K | 1 | 1 | 1 | | 40 K | 3 | 3 | 1 | | 80 K | 7 | 4 | 1.75 | | 100 K | 9 | 3 | 3.00 | | 150 K | 14 | 5 | 2.80 | | 200 K | 18 | 6 | 3.00 | | 400 K | 38 | 10 | 3.80 | | 600 K | 61 | 15 | 4.07 | | 800 K | 76 | 19 | 4.00 | | 1 M | 98 | 24 | 4.08 | | 2 M | 204 | 48 | 4.25 | | 4 M | 424 | 112 | 3.79 | | 6 M | 677 | 384 | 1.76 | | 8 M | 919 | 621 | 1.48 | | 10 M | 1132 | 776 | 1.46 | | 100 M | 11842 | 10566 | 1.12 | ### Motivation and Context Improves latency of quantized QDQ models that with large DQs that dominate the inference latency.

add a build option to enable default options more appropriate for client/on-device workloads. initial use case will be to set the default thread pool allow_spinning policy , which we want to default to 0/false for builds targeted for client/on-device workloads. --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

### Description Enable DSP queue polling when performance profile is burst

### Description Update Qnn default version to 2.36.1.250708 Co-authored-by: Jeff Kilpatrick <[email protected]>

Previously the machine pool had a User-assigned managed identity (UMI) which was used for accessing the blob storage. Now the UMI was removed. to improve security. Therefore we baked the data into the VM image instead.

### Description Use the license file from QNN SDK to make sure it's up to date. --------- Co-authored-by: adrianlizarraga <[email protected]>

snnn · 2025-08-01T18:00:41Z

Please check if these two changes are included: #24760 #24802 Might be needed for making the Linux Android Emulator QNN CI Pipeline pass

snnn · 2025-08-02T00:56:45Z

Please also cherry-pick #25523 , because the old machine pool Onnxruntime-github-Linux-GPU-A100-WUS3 is gone.

To include the following change: dmlc/dlpack#165

…4802) ### Description  Currently some required ADO pipeline fails because of version mismatch between vcpkg build and non vcpkg build. This PR fixes the failed builds. ### Motivation and Context

snnn

Thank you!

zhaoxul-qti and others added 18 commits August 1, 2025 02:15

[QNN-EP] Support for Upsample operator (#24265)

41dbfdb

Add support to Upsample operator to op builder in QNN-EP. - Enhance QNN-EP support for Upsample operator. - Add unit test for Upsample operator in QNN-EP.

[QNN EP] Add Einsum support for some equations (#24616)

8074efe

[QNN EP] Add Einsum support for some equations. Intend is not to support all equations. But to enable case by case to improve performance.

Enable use_vcpkg for QNN Nuget package build and Python arm64ec build (…

2e237af

…#24640) ### Description enable use_vcpkg for QNN Nuget package build and Python arm64ec build

[QNN-EP] [NFC] Fix qnn_node_group.h file location (#24707)

977d72f

NFC, move file location only.

[QNN EP] Add LSTM op builder for QNN EP (#24646)

6f1fe03

### Description Add LSTM support for QNN EP ### Motivation and Context Add LSTM support for QNN EP

Update Qnn default version to 2.34.0.250424 (#24750)

276361c

### Description Update Qnn default version to 2.34.0.250424

[QNN EP] Fuse scale into softmax (#24809)

255bcdd

QNN [Softmax op defines pre-scale (`beta`)](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/MasterOpDef.html#softmax) that we can fold constant scalar multiply into it.

Remove ep_weight_sharing_ctx_gen tool from QNN EP python wheel (#24895)

7eaf3f4

### Description Remove ep_weight_sharing_ctx_gen tool from QNN EP python wheel

[QNN-EP] Add Support for CumSum in QNN EP (#24820)

3942795

- Registered CumSum op in QNN EP - Added unit test to verify accuracy and assignment of op to QNN EP ### Description Added support for CumSum in QNN EP ### Motivation and Context There is no support for CumSum op in QNN EP

Upgrade QNN to 2.35.0 (#25002)

8bed250

### Description Update Qnn default version to 2.35.0.250530

[QNN EP] Upgrade QNN to 2.36.0 (#25283)

3e65d77

### Description Update Qnn default version to 2.36.0.250627

QNN-EP: DSPQueue Polling (#25361)

b6fb123

### Description Enable DSP queue polling when performance profile is burst

[QNN EP] Upgrade QNN to 2.36.1 (#25388)

817b492

### Description Update Qnn default version to 2.36.1.250708 Co-authored-by: Jeff Kilpatrick <[email protected]>

Fix QNN SDK download problem (#25520)

d3df27d

Previously the machine pool had a User-assigned managed identity (UMI) which was used for accessing the blob storage. Now the UMI was removed. to improve security. Therefore we baked the data into the VM image instead.

Qnn license file update (#25158)

fd3725e

### Description Use the license file from QNN SDK to make sure it's up to date. --------- Co-authored-by: adrianlizarraga <[email protected]>

snnn and others added 6 commits August 5, 2025 12:03

Update dlpack to a newer version (#24760)

774ce0e

To include the following change: dmlc/dlpack#165

Move Linux CUDA pipelines to H100 (#25523)

01315e5

Bump version to 1.22.2

3d515a8

npm run format

e4e039e

replace non-existing machine pool onnxruntime-Ubuntu2204-AMD-CPU-Large

583d686

adrianlizarraga marked this pull request as ready for review August 8, 2025 23:08

adrianlizarraga requested review from a team as code owners August 8, 2025 23:08

jywu-msft approved these changes Aug 9, 2025

View reviewed changes

snnn approved these changes Aug 10, 2025

View reviewed changes

adrianlizarraga merged commit 19f37d7 into rel-1.22.2 Aug 11, 2025
72 of 98 checks passed

adrianlizarraga deleted the adrianl/rel-1.22.2/cherrypick-1 branch August 11, 2025 18:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

rel-1.22.2 cherry-pick 1 #25633

rel-1.22.2 cherry-pick 1 #25633

Uh oh!

adrianlizarraga commented Aug 1, 2025

Uh oh!

snnn commented Aug 1, 2025

Uh oh!

snnn commented Aug 2, 2025 •

edited

Loading

Uh oh!

snnn left a comment

Uh oh!

Uh oh!

Uh oh!

rel-1.22.2 cherry-pick 1 #25633

rel-1.22.2 cherry-pick 1 #25633

Uh oh!

Conversation

adrianlizarraga commented Aug 1, 2025

Description

Motivation and Context

Uh oh!

snnn commented Aug 1, 2025

Uh oh!

snnn commented Aug 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

snnn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

snnn commented Aug 2, 2025 •

edited

Loading