Add run options to ONNX Runtime GenAI #1795

kunal-vaishnavi · 2025-09-26T17:52:39Z

Description

This PR allows users to customize run options per ONNX model that runs in ONNX Runtime GenAI. It also enables users to provide separate session options and provider options per ONNX model.

Usage

The run options can be added as key-value pairs in a separate, optional section within the GenAI config.

"session_options": {
    "log_id": "onnxruntime-genai",
    "use_device_allocator_for_initializers": true,
    "provider_options": [
        {
            "cuda": {
                "enable_cuda_graph": "0",
                "enable_skip_layer_norm_strict_mode": "1",
                "max_mem": "0",
                "arena_extend_strategy": "0",
                "initial_chunk_size_bytes": "5368709120",
                "max_dead_bytes_per_chunk": "0",
                "initial_growth_chunk_size_bytes": "1000000000"
            }
        }
    ]
},
"run_options": {
    "enable_memory_arena_shrinkage": "cpu:0;gpu:0"
},

You can also have separate run options per ONNX model within the GenAI config.

"decoder": {
    "session_options": {
        "log_id": "onnxruntime-genai",
        "provider_options": []
    },
    "run_options": {
        "enable_memory_arena_shrinkage": "cpu:0;gpu:0"
    }
},
"vision": {
    "session_options": {
        "log_id": "onnxruntime-genai",
        "provider_options": []
    },
    "run_options": {
        "enable_memory_arena_shrinkage": "cpu:0;gpu:0"
    },
    "inputs": {
        "pixel_values": "pixel_values",
        "attention_mask": "image_attention_mask",
        "image_sizes": "image_sizes"
    },
    "outputs": {
        "image_features": "image_features"
    }
},
"speech": {
    "session_options": {
        "log_id": "onnxruntime-genai",
        "provider_options": []
    },
    "run_options": {
        "enable_memory_arena_shrinkage": "cpu:0;gpu:0"
    },
    "inputs": {
        "audio_embeds": "audio_embeds",
        "attention_mask": "audio_attention_mask",
        "audio_sizes": "audio_sizes",
        "audio_projection_mode": "audio_projection_mode"
    },
    "outputs": {
        "audio_features": "audio_features"
    }
},
"embedding": {
    "session_options": {
        "log_id": "onnxruntime-genai",
        "provider_options": []
    },
    "run_options": {
        "enable_memory_arena_shrinkage": "cpu:0;gpu:0"
    },
    "inputs": {
        "input_ids": "input_ids",
        "image_features": "image_features",
        "audio_features": "audio_features"
    },
    "outputs": {
        "inputs_embeds": "inputs_embeds"
    }
},

Documentation

Session Options

For a full list, please see the list of keys available here.

Provider Options

For a full list, please see your target execution provider's page inside the ONNX Runtime docs.

Run Options

For a full list, please see the list of keys available here.

Motivation and Context

This PR allows users to use run options such as memory.enable_memory_arena_shrinkage to reduce memory usage for memory-constrained environments.

Here is a quick reference of the memory benefits for Phi-4 multi-modal with two example images.

Run Option	Provider Option	After First Run	After Second Run
None	None	13290 MiB	27626 MiB
"enable_memory_arena_shrinkage": "cpu:0;gpu:0"	None	11240 MiB	24552 MiB
None	"arena_extend_strategy": "0"	10978 MiB	25870 MiB
"enable_memory_arena_shrinkage": "cpu:0;gpu:0"	"arena_extend_strategy": "0"	9506 MiB	23224 MiB

It also resolves this issue.

kunal-vaishnavi added 6 commits September 24, 2025 01:53

Add run options to GenAI config

5e643e1

Merge branch 'main' into kvaishnavi/run-options

4f9377e

Use CPU only for unit test

0b4be4a

Remove extra spaces for clang linter

2f0df9a

Skip unit test for DML

71b72b8

Add support for separate session options and provider options

11dd4d9

This was referenced Sep 27, 2025

Support dedicated session and provider options for each model in VLM #1699

Open

Unable to Control CPU Arena Allocator Behavior in ONNX GenAI Android Java/Kotlin (0.8.2) #1584

Open

kunal-vaishnavi added 4 commits September 30, 2025 02:29

Add control over BFC arena

4fe1605

Fix default values for arena config

20ac1da

Add log verbosity to session options

9dd8dce

Add changes suggested by C++ linter

d3b9371

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add run options to ONNX Runtime GenAI #1795

Add run options to ONNX Runtime GenAI #1795

Uh oh!

kunal-vaishnavi commented Sep 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Add run options to ONNX Runtime GenAI #1795

Are you sure you want to change the base?

Add run options to ONNX Runtime GenAI #1795

Uh oh!

Conversation

kunal-vaishnavi commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Usage

Documentation

Session Options

Provider Options

Run Options

Motivation and Context

Uh oh!

Uh oh!

kunal-vaishnavi commented Sep 26, 2025 •

edited

Loading