Add run options to ONNX Runtime GenAI #1795
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR allows users to customize run options per ONNX model that runs in ONNX Runtime GenAI. It also enables users to provide separate session options and provider options per ONNX model.
Usage
The run options can be added as key-value pairs in a separate, optional section within the GenAI config.
You can also have separate run options per ONNX model within the GenAI config.
Documentation
Session Options
For a full list, please see the list of keys available here.
Provider Options
For a full list, please see your target execution provider's page inside the ONNX Runtime docs.
Run Options
For a full list, please see the list of keys available here.
Motivation and Context
This PR allows users to use run options such as
memory.enable_memory_arena_shrinkage
to reduce memory usage for memory-constrained environments.Here is a quick reference of the memory benefits for Phi-4 multi-modal with two example images.
It also resolves this issue.