-
Notifications
You must be signed in to change notification settings - Fork 219
Description
The latest VLMs generally consist of multiple models such as Vision, Embedding, and LLM. For efficient acceleration, it typically requires to configure these models with different session options and provider options. This gives flexibility when multiple hardware resources are available (for example, on an AI PC with CPU, GPU, and NPU), so that each model can have its own configuration and potentially run on different hardware.
This doesn’t seem possible today. In the code multi_modal.cpp#L78
, the embedding model’s session options are strictly tied to the text model’s session options. The same applies to the vision model. If we try to set session options for either the embedding or vision model, the flow errors out.
Fundamentally, ONNX Runtime supports multiple models through different sessions, with each session having its own session options and provider options. Since ONNX Runtime GenAI is built on top of ONNX Runtime, it should allow developers to do the same for VLMs, which are typically composed of multiple sub-models.
Thank you