-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Description
Model classes were built for a specific API and model family, encoding not just the hard API spec but also various behaviors related to those specific models' capabilities and limitations, e.g. what JSON schemas or multi-modal input types are supported.
This breaks down when a model class is used with a different API or a different family of models that don't match all of those hard and soft assumptions:
OpenAIModel
is used with various ostensibly-OpenAI-compatible APIs and a wide variety of models.BedrockModel
andGroqModel
are used with a wide variety of models.
So far, the biggest class of issues (at least one filed a day) has been in JSON schema handling. OpenAIModel
and GeminiModel
both implement their own transformers (OpenAI, Gemini), but people using OpenAIModel
and BedrockModel
with other models have been running into API errors when using particular models, even if others on the same provider may work fine (suggesting this really is model-specific, and not something that an OpenAI-compatible API "should" handle consistently across all models). Resolving this currently requires manually defining a model subclass and applying one of the existing JSON schema transformer, sometimes with tweaks:
OpenAIModel
+ Together.xyz + Qwen fails, but works with Llama Erratic performance when using nested schemas #1659OpenAIModel
+ OpenRouter + Gemini fails Different behaviour with Gemini models using OpenAI+OpenRouter #1735BedrockModel
+ Nova fails, but works with others Amazon Nova (Bedrock) limitations with tool schema #1623
We're going to run into something similar with Structured Output Modes:
OpenAIModel
+ pre-4o doesn't supportjson_schema
, onlyjson_object
(and tool calls and manual JSON)- Some models don't support
json_schema
orjson_object
, only tool calls or manual JSON - Some models don't support tool calls at all, only manual JSON
GeminiModel
(and presumablyBedrockModel
+ Gemini) doesn't supportjson_schema
alongside tools
Built-in tools may also be different between providers used with the same model class (e.g. OpenAIModel
+ OpenRouter), or between models on the same provider (as some models may not support tool calls at all):
- OpenAI supports web search, file search, computer use https://platform.openai.com/docs/guides/tools
- Anthropic supports web search https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/web-search-tool
- Gemini supports code execution https://ai.google.dev/gemini-api/docs/code-execution
That's not the end of it, unfortunately, as we've already seen some other axes where different models may need different handling to get the most out of them:
- Different models may need to be prompted differently to be most effective at outputting JSON Structured outputs as an alternative to Tool Calling #582 (comment)
BedrockModel
+ some models don't support tool use https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference-supported-models-features.htmlOpenAIModel
+ PPInfra + Qwen doesn't supportstrict
on tool definitions Inferred strict=true may cause compatibility issues with OpenAI-compatible servers #1561OpenAIModel
+ Grok doesn't supportstrict
on tool definitions strict` mode on function call is currently not supported for grok models #1846BedrockModel
+ Claude doesn't support tool choice https://github.com/pydantic/pydantic-ai/blob/main/pydantic_ai_slim/pydantic_ai/models/bedrock.py#L305BedrockModel
+ Mistral (and others?) require tool result to be passed as an object instead of string Support Tool Calling with Llama 3.3 on Bedrock #1649- With model classes that cover many models, like
OpenAIModel
andBedrockModel
, not all will support all multi-modal input types (video, audio, image, docs). - Claude doesn't natively do parallel tool calls and recommends providing an explicit
batch
tool New Common Tool: Batch #1769
I think it's time to pull some model and model family-specific details out of the model classes, generalize them, and allow them to be tweaked on a model-by-model basis.
This'll be somewhat similar to ModelSettings
, but instead of properties to be passed directly to the model API, these new properties will determine how PydanticAI builds its request payload to get the most out of each specific model and work around limitations.
There'd be global defaults, model class/family defaults layered on top of that, model-specific overrides provided by the model class file, and the ability for users to tweak the settings further, or even use the settings defined by one model class (e.g. GeminiModel
's specification for 2.5 Pro) with another model class (like OpenAIModel
+ OpenRouter + Gemini).
Because we're basically describing how the model likes to be talked to, I'm leaning towards the name ModelProfile
or ModelSpec
or something similar -- but very open to other suggestions.
It'd look something like this:
@dataclass
class ModelProfile:
json_schema_transformer: Literal['openai', 'gemini'] | type[WalkJsonSchema]
supported_output_modes: set[Literal['tool', 'json_schema', 'json_object', 'manual_json']]
default_output_mode: Literal['tool', 'json_schema', 'json_object', 'manual_json']
# definitely not all necessary right away, but to give you an idea
built_in_tools: dict[str, dict[str, Any]]
manual_json_prompt: str
tool_use: bool
strict_tools: bool
tool_choice: bool
tool_result_type: Literal['string', 'object']
multi_modal_input_types: set[Literal['video', 'audio', 'image', 'docs']]
offer_batch_tool: bool
# models/__init__.py
DEFAULT_PROFILE = ModelProfile(...)
# models/openai.py
DEFAULT_OPENAI_PROFILE = replace(DEFAULT_PROFILE, json_schema_transformer='openai', ...)
OPENAI_PROFILES = {}
OPENAI_PROFILES['gpt-4'] = replace(DEFAULT_OPENAI_PROFILE, supported_output_modes={'tool', 'json_object', 'manual_json'})
OPENAI_PROFILES['gpt-4o'] = replace(OPENAI_PROFILES['gpt-4'], supported_output_modes={'tool', 'json_schema', 'manual_json'})
# models/gemini.py
DEFAULT_GEMINI_PROFILE = replace(DEFAULT_PROFILE, json_schema_transformer='gemini', ...)
GEMINI_PROFILES = {}
GEMINI_PROFILES['gemini-2.0-flash-001'] = replace(DEFAULT_GEMINI_PROFILE)
# models/anthropic.py
DEFAULT_ANTHROPIC_PROFILE = replace(DEFAULT_PROFILE, ...)
ANTHROPIC_PROFILES = {}
ANTHROPIC_PROFILES['claude-3-5-sonnet-20240620'] = replace(DEFAULT_ANTHROPIC_PROFILE, ...)
# models/bedrock.py
DEFAULT_BEDROCK_PROFILE = replace(DEFAULT_PROFILE)
BEDROCK_PROFILES = {}
BEDROCK_PROFILES['us.anthropic.claude-3-5-sonnet-20240620'] = ANTHROPIC_PROFILES['claude-3-5-sonnet-20240620'] # or some cleverer way to read these automatically based on name
# my_agent.py
model = OpenAIModel(model_name='gpt-4o')
openrouter_provider =
model = OpenAIModel(
"google/gemini-2.0-flash-001",
provider=OpenAIProvider(base_url="https://openrouter.ai/api/v1", ...),
profile=GEMINI_PROFILES['gemini-2.0-flash-001']
)
model = AnthropicModel(model_name='claude-3-5-sonnet-20240620')
model = BedrockModel(model_name='llama3.3', profile=replace(DEFAULT_PROFILE, json_schema_transformer='gemini'))
# could also work, if we merge in the defaults (or just set those on the dataclass/pydantic model?)
model = BedrockModel(model_name='llama3.3', profile=ModelProfile(json_schema_transformer='gemini'))
I'd start by implementing this for json_schema_transformer
as that's the main one causing issues today, but since we have the output modes in the pipeline, I'd rather implement this with a new class from the get-go rather than with a json_schema_transformer
argument directly set on Model
.
@dmontagu @Kludex Thoughts? :)
References
No response