-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Scrubbing sensitive content #2014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scrubbing sensitive content #2014
Conversation
PR Change SummaryIntroduced a new flag to manage sensitive content in instrumentation settings and updated documentation accordingly.
Modified Files
How can I customize these reviews?Check out the Hyperlint AI Reviewer docs for more information on how to customize the review. If you just want to ignore it on this PR, you can add the Note specifically for link checks, we only check the first 30 links in a file and we cache the results for several hours (for instance, if you just added a page, you might experience this). Our recommendation is to add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tool call arguments also need to be omitted.
The tool call arguments also need to be excluded from the 'running tool' span. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder: The tool call arguments also need to be excluded from the 'running tool' span.
…avrdhn/pydantic-ai into scrubbing_sensitive_content
if body.get('content'): | ||
body = new_event_body() | ||
body['content'] = part.content | ||
if settings.include_content: | ||
body['content'] = part.content |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implies that the number of events produced will depend on whether content is included if multiple tool calls are made in a single message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to understand what you mean exactly
Would you say the below test captures what you are trying to imply:
def test_otel_events_consistency_with_include_content():
"""Test that the number of OpenTelemetry events is consistent regardless of include_content setting."""
# Create a response with multiple tool calls followed by text
response = ModelResponse(parts=[
ToolCallPart('tool1', {'arg1': 'value1'}, 'call_1'),
ToolCallPart('tool2', {'arg2': 'value2'}, 'call_2'),
TextPart('Some text response')
])
settings_with_content = InstrumentationSettings(include_content=True)
events_with_content = response.otel_events(settings_with_content)
settings_without_content = InstrumentationSettings(include_content=False)
events_without_content = response.otel_events(settings_without_content)
assert len(events_with_content) == len(events_without_content), (
f"Event count differs: with_content={len(events_with_content)}, "
f"without_content={len(events_without_content)}"
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, i didn't think this through properly. the difference would be if a message had multiple text parts. but i don't think this happens in practice, so it isn't worth worrying about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the difference would be if a message had multiple text parts. but i don't think this happens in practice
I'm not caught up on this whole conversation, but I do think a message with multiple text parts could be realistic. For example, multiple text parts could be used to separate different inputs:
- text part: compare these two paragraphs, which is better?
- text part: paragraph 1
- text part: paragraph 2
Of course this could be done by concatenating all text into a single part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I get it. Something like this would cause this issue.
response = ModelResponse(parts=[
TextPart('Some text response'),
ToolCallPart('tool2', {'arg2': 'value2'}, 'call_2'),
TextPart('Some more text response'),
ToolCallPart('tool1', {'arg1': 'value1'}, 'call_1'),
TextPart('Even more text response'),
])
Co-authored-by: Alex Hall <[email protected]>
Co-authored-by: Alex Hall <[email protected]>
Thanks! |
Fixes #1571
Adds include_content flag in InstrumentationSettings, default set to True
Adds include_tool_args to remove the arguments from the running tool spans when incude_content is set to False
Adds example in logfire docs