Support dedicated session and provider options for each model in VLM

The latest VLMs generally consist of multiple models such as Vision, Embedding, and LLM. For efficient acceleration, it typically requires to configure these models with different session options and provider options. This gives flexibility when multiple hardware resources are available (for example, on an AI PC with CPU, GPU, and NPU), so that each model can have its own configuration and potentially run on different hardware.

This doesn’t seem possible today. In the code [multi_modal.cpp#L78](https://github.com/microsoft/onnxruntime-genai/blob/v0.9.0/src/models/multi_modal.cpp#L78)
, the embedding model’s session options are strictly tied to the text model’s session options. The same applies to the vision model. If we try to set session options for either the embedding or vision model, the flow errors out.

Fundamentally, ONNX Runtime supports multiple models through different sessions, with each session having its own session options and provider options. Since ONNX Runtime GenAI is built on top of ONNX Runtime, it should allow developers to do the same for VLMs, which are typically composed of multiple sub-models.

Thank you


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support dedicated session and provider options for each model in VLM #1699

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support dedicated session and provider options for each model in VLM #1699

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions