Skip to content

Support dedicated session and provider options for each model in VLM #1699

@uday610

Description

@uday610

The latest VLMs generally consist of multiple models such as Vision, Embedding, and LLM. For efficient acceleration, it typically requires to configure these models with different session options and provider options. This gives flexibility when multiple hardware resources are available (for example, on an AI PC with CPU, GPU, and NPU), so that each model can have its own configuration and potentially run on different hardware.

This doesn’t seem possible today. In the code multi_modal.cpp#L78
, the embedding model’s session options are strictly tied to the text model’s session options. The same applies to the vision model. If we try to set session options for either the embedding or vision model, the flow errors out.

Fundamentally, ONNX Runtime supports multiple models through different sessions, with each session having its own session options and provider options. Since ONNX Runtime GenAI is built on top of ONNX Runtime, it should allow developers to do the same for VLMs, which are typically composed of multiple sub-models.

Thank you

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions