Skip to content

Conversation

zhaoxul-qti and others added 18 commits August 1, 2025 02:15
Add support to Upsample operator to op builder in QNN-EP.

- Enhance QNN-EP support for Upsample operator.
- Add unit test for Upsample operator in QNN-EP.
[QNN EP] Add Einsum support for some equations. Intend is not to support all equations. But to enable case by case to improve performance.
…#24640)

### Description
enable use_vcpkg for QNN Nuget package build and Python arm64ec build
### Description
Add LSTM support for QNN EP

### Motivation and Context
Add LSTM support for QNN EP
### Description
Update Qnn default version to 2.34.0.250424
QNN [Softmax op defines pre-scale (`beta`)](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/MasterOpDef.html#softmax) that we can fold constant scalar multiply into it.
### Description
Remove ep_weight_sharing_ctx_gen tool from QNN EP python wheel
- Registered CumSum op in QNN EP
- Added unit test to verify accuracy and assignment of op to QNN EP

### Description
Added support for CumSum in QNN EP



### Motivation and Context
There is no support for CumSum op in QNN EP
### Description
Update Qnn default version to 2.35.0.250530
* Re-enable tests and remove workarounds that were introduced as part of a QNN <= 2.31 upgrade but are no longer necessary.

QNN/QAIRT releases about once a month. As ONNX Runtime adopts these new versions, some number of tests are often found to be impacted.
Consequently, tests are skipped and tolerances are loosened. This change reverts as many of those workarounds as possible that were made for QNN upgrades between 2.17 and 2.31, inclusive. The most recent few releases were intentionally not examined to minimize impact on users on old versions and to avoid lock-in to the bleeding edge.

---------

Co-authored-by: Jeff Kilpatrick <[email protected]>
### Description

Update Qnn default version to 2.36.0.250627
### Description
- Adds multithreaded vectorized implementations of DequantizeLinear for
int8 and uint8 inputs:
  - Intel SSE 2
  - ARM NEON
- All other architectures fallback to a multithreaded scalar reference
implementation (previous was not multithreaded).
- **Note**: only enabled if ORT is built for client/on-device workloads
(`ORT_CLIENT_PACKAGE_BUILD` is defined).

INT8 DequantizeLinear latency on Intel Core i9-10920X with 4 intra op
threads (SSE 2 implementation)

| Number of elements | Baseline latency (us) | Multithreaded+SIMD
latency (us) | Speedup |
| ----------------------- | ---------------------- |
------------------------------------ | ---------- |
| 10 K | 1 | 1 | 1 |
| 20 K | 2 | 2 | 1 |
| 40 K | 5 | 5 | 1 |
| 80 K | 11 | 4 | 2.75 |
| 100 K | 14 | 5 | 2.80 |
| 150 K | 21 | 7 | 3.00 |
| 200 K | 28 | 8 | 3.50 |
| 400 K | 68 | 15 | 4.53 |
| 600 K | 107 | 21 | 5.10 |
| 800 K | 142 | 28 | 5.07 |
| 1 M | 187 | 42 | 4.45 |
| 2 M | 376 | 102 | 3.69 |
| 4 M | 880 | 236 | 3.73 |
| 6 M | 1547 | 557 | 2.78 |
| 8 M | 2438 | 1097 | 2.22 |
| 10 M | 3192 | 1464 | 2.18 |
| 100 M | 38718 | 17733 | 2.18 |

INT8 DequantizeLinear latency on Snapdragon 8cx gen 3 @ 3.4GHz with 4
intra op threads (NEON implementation)

| Number of elements | Baseline latency (us) | Multithreaded+SIMD
latency (us) | Speedup |
| ----------------------- | ---------------------- |
------------------------------------ | ---------- |
| 10 K | 1 | 1 | 1 |
| 20 K | 1 | 1 | 1 |
| 40 K | 3 | 3 | 1 |
| 80 K | 7 | 4 | 1.75 |
| 100 K | 9 | 3 | 3.00 |
| 150 K | 14 | 5 | 2.80 |
| 200 K | 18 | 6 | 3.00 |
| 400 K | 38 | 10 | 3.80 |
| 600 K | 61 | 15 | 4.07 |
| 800 K | 76 | 19 | 4.00 |
| 1 M | 98 | 24 | 4.08 |
| 2 M | 204 | 48 | 4.25 |
| 4 M | 424 | 112 | 3.79 |
| 6 M | 677 | 384 | 1.76 |
| 8 M | 919 | 621 | 1.48 |
| 10 M | 1132 | 776 | 1.46 |
| 100 M | 11842 | 10566 | 1.12 |
### Motivation and Context
Improves latency of quantized QDQ models that with large DQs that
dominate the inference latency.
add a build option to enable default options more appropriate for
client/on-device workloads.
initial use case will be to set the default thread pool allow_spinning
policy , which we want to default to 0/false for builds targeted for
client/on-device workloads.

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
### Description
Enable DSP queue polling when performance profile is burst
### Description

Update Qnn default version to 2.36.1.250708

Co-authored-by: Jeff Kilpatrick <[email protected]>
Previously the machine pool had a User-assigned managed identity (UMI)
which was used for accessing the blob storage. Now the UMI was removed.
to improve security. Therefore we baked the data into the VM image
instead.
### Description
Use the license file from QNN SDK to make sure it's up to date.

---------

Co-authored-by: adrianlizarraga <[email protected]>
@snnn
Copy link
Member

snnn commented Aug 1, 2025

Please check if these two changes are included: #24760 #24802 Might be needed for making the Linux Android Emulator QNN CI Pipeline pass

@snnn
Copy link
Member

snnn commented Aug 2, 2025

Please also cherry-pick #25523 , because the old machine pool Onnxruntime-github-Linux-GPU-A100-WUS3 is gone.

snnn and others added 6 commits August 5, 2025 12:03
…4802)

### Description
<!-- Describe your changes. -->

Currently some required ADO pipeline fails because of version mismatch
between vcpkg build and non vcpkg build. This PR fixes the failed
builds.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
@adrianlizarraga adrianlizarraga marked this pull request as ready for review August 8, 2025 23:08
@adrianlizarraga adrianlizarraga requested review from a team as code owners August 8, 2025 23:08
Copy link
Member

@snnn snnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@adrianlizarraga adrianlizarraga merged commit 19f37d7 into rel-1.22.2 Aug 11, 2025
72 of 98 checks passed
@adrianlizarraga adrianlizarraga deleted the adrianl/rel-1.22.2/cherrypick-1 branch August 11, 2025 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.