⚡️ Speed up function tensor_to_value
by 31%
#35
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
📄 31% (0.31x) speedup for
tensor_to_value
ingoogle/cloud/aiplatform/_streaming_prediction.py
⏱️ Runtime :
8.69 milliseconds
→6.65 milliseconds
(best of264
runs)📝 Explanation and details
The optimized version achieves a 30% speedup by eliminating redundant
ListFields()
calls and streamlining the execution path:Key optimizations:
Eliminated duplicate
ListFields()
call: The original code calledtensor_pb.ListFields()
twice - once to get the list and again to extractdescriptor, value
. The optimized version stores the result inlist_of_fields
and reuses it withlist_of_fields[0]
, reducing expensive protobuf method calls.Cached descriptor name: Instead of accessing
descriptor.name
multiple times in conditional checks, the optimized version stores it asname = descriptor.name
once, reducing attribute access overhead.Streamlined final return logic: The original version had separate
if len(value) == 1
andelse
blocks with explicit returns. The optimized version uses a single ternary expressionreturn value[0] if len(value) == 1 else value
, eliminating branching overhead.Why it's faster:
ListFields()
is a relatively expensive protobuf operation that involves introspection of the message structure. Eliminating one call per function invocation directly reduces computation.descriptor.name
) has lookup overhead in Python. Caching it in a local variable provides faster access.Performance characteristics:
The optimizations show consistent 10-32% improvements across all test cases, with particularly strong gains (30%+) on large-scale scenarios involving 1000+ elements. The benefits compound in nested structures where
tensor_to_value
is called recursively many times.✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-tensor_to_value-mgkk3x1u
and push.