Skip to content

Conversation

alvarobartt
Copy link
Member

What does this PR do?

This PR updates how the ort inputs are being handled to extend support for optional values such as position_ids or past_key_values.

e.g. onnx-community/Qwen3-Embedding-0.6B-ONNX has been now enabled on Text Embeddings Inference (TEI) via ort thanks to the conditional addition of the past_key_values if amongst the inputs, to leverage the optimized Multi-head Attention (MHA) nodes that do require those inputs, so now one could run the following (whereas with the version in main it would fail with missing inputs on past_key_values):

cargo run --release --features ort,http --no-default-features -- --model-id onnx-community/Qwen3-Embedding-0.6B-ONNX --dtype float32 --pooling last-token

Note

On a subsequent PR, both the input building and the pooling strategies for ONNX need to be updated to use the correct padding side, as it defaults to right padding which tends to be the norm, but there are models such as e.g. https://huggingface.co/onnx-community/Qwen3-Embedding-0.6B-ONNX, that use left padding instead.


Besides that this PR also bumps ort to the latest release candidate whilst v2.0.0 is released, with the following breaking changes:

  • ort::session::Session is now mutable, so placed in a Mutex
  • try_extract_tensor is now try_extract_array
  • ort::inputs! won't accept the ndarray, but rather an ort::value
    e.g. ort::value::Tensor

Additionally, it requires to pin ort-sys and add the std feature to ort. More information in the release notes at https://github.com/pykeio/ort/releases/tag/v2.0.0-rc.10.

Warning

As per the breaking changes above, and considering that the ort release is just a release candidate and not a stable version, that will most likely not land yet on Text Embeddings Inference (TEI), but still leaving the history in the PR just in case.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline?
  • Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the documentation guidelines.
  • Did you write any new necessary tests? If applicable, did you include or update the insta snapshots?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@Narsil

- `ort::session::Session` is now mutable, so placed in a `Mutex`
- `try_extract_tensor` is now `try_extract_array`
- `ort::inputs!` won't accept the `ndarray`, but rather an `ort::value`
e.g. `ort::value::Tensor`
@alvarobartt alvarobartt changed the title Add support for position_ids and past_key_values in OnnxBackend Add support for position_ids and past_key_values in OrtBackend Aug 18, 2025
@alvarobartt alvarobartt requested a review from Narsil August 19, 2025 08:47
@alvarobartt alvarobartt merged commit 6980437 into main Aug 20, 2025
14 checks passed
@alvarobartt alvarobartt deleted the fix-ort-inputs branch August 20, 2025 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants