Skip to content

Multimodal tool return type #1497

@kawaijoe

Description

@kawaijoe

Description

I'm not entirely sure if this is a bug or a missing feature but I think it would make sense to be able to return multimodal types. I believe currently types such as DocumentUrl gets serialized as a json.

For example:

@agent.tool_plain
def special_document() -> DocumentUrl:
  '''Retrieve a research paper for analysis.'''
  return DocumentUrl(url='https://arxiv.org/pdf/2504.07136')

Full example (DocumentUrl returned as tool)

import httpx
from google.colab import userdata
from pydantic_ai import Agent, BinaryContent, DocumentUrl
from pydantic_ai.models.gemini import GeminiModel
from pydantic_ai.providers.google_gla import GoogleGLAProvider

model = GeminiModel(
  'gemini-2.5-pro-preview-03-25',
  provider=GoogleGLAProvider(api_key=userdata.get('GEMINI_API_KEY'))
)

agent = Agent(model)

documentUrl = DocumentUrl(url='https://arxiv.org/pdf/2504.07136')

@agent.tool_plain
def special_document() -> DocumentUrl:
  '''Retrieve a research paper for analysis.'''
  return documentUrl

result = await agent.run(
  [
    'I need to read a research paper. Please use the special_document tool to get the paper and tell me its title.'
  ]
)

print('Agent response:')
print(result.output)

Agent response:
Okay, I have retrieved the research paper using the special_document tool.

However, the tool only provided a URL to the paper's PDF file: [https://arxiv.org/pdf/2504.07136](https://arxiv.org/pdf/2504.07136%60)

It did not return the content or the title of the paper itself. Therefore, I cannot tell you the title based on the information provided by the tool. You can access the paper at the URL above to read it and find its title.

Full example (DocumentUrl passed in agent.run())

import httpx
from google.colab import userdata
from pydantic_ai import Agent, BinaryContent, DocumentUrl
from pydantic_ai.models.gemini import GeminiModel
from pydantic_ai.providers.google_gla import GoogleGLAProvider

model = GeminiModel(
  'gemini-2.5-pro-preview-03-25',
  provider=GoogleGLAProvider(api_key=userdata.get('GEMINI_API_KEY'))
)

agent = Agent(model)

documentUrl = DocumentUrl(url='https://arxiv.org/pdf/2504.07136')

result = await agent.run(
  [
    'I need to read a research paper. Please use the special_document tool to get the paper and tell me its title.',
    documentUrl # Directly pass in documentUrl.
  ]
)

print('Agent response:')
print(result.output)

Agent response:
Okay, I have accessed the research paper using the special_document tool.

The title of the paper is: The spectrum of magnetized turbulence in the interstellar medium

References

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions