Skip to content

Conversation

codeflash-ai[bot]
Copy link

@codeflash-ai codeflash-ai bot commented Oct 10, 2025

📄 31% (0.31x) speedup for tensor_to_value in google/cloud/aiplatform/_streaming_prediction.py

⏱️ Runtime : 8.69 milliseconds 6.65 milliseconds (best of 264 runs)

📝 Explanation and details

The optimized version achieves a 30% speedup by eliminating redundant ListFields() calls and streamlining the execution path:

Key optimizations:

  1. Eliminated duplicate ListFields() call: The original code called tensor_pb.ListFields() twice - once to get the list and again to extract descriptor, value. The optimized version stores the result in list_of_fields and reuses it with list_of_fields[0], reducing expensive protobuf method calls.

  2. Cached descriptor name: Instead of accessing descriptor.name multiple times in conditional checks, the optimized version stores it as name = descriptor.name once, reducing attribute access overhead.

  3. Streamlined final return logic: The original version had separate if len(value) == 1 and else blocks with explicit returns. The optimized version uses a single ternary expression return value[0] if len(value) == 1 else value, eliminating branching overhead.

Why it's faster:

  • ListFields() is a relatively expensive protobuf operation that involves introspection of the message structure. Eliminating one call per function invocation directly reduces computation.
  • Attribute access (descriptor.name) has lookup overhead in Python. Caching it in a local variable provides faster access.
  • The ternary expression is more CPU-efficient than separate conditional branches for simple cases.

Performance characteristics:
The optimizations show consistent 10-32% improvements across all test cases, with particularly strong gains (30%+) on large-scale scenarios involving 1000+ elements. The benefits compound in nested structures where tensor_to_value is called recursively many times.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 54 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from typing import Any, Sequence

# imports
import pytest  # used for our unit tests
from aiplatform._streaming_prediction import tensor_to_value

# Mocks for google.cloud.aiplatform_v1.types.Tensor and its ListFields method
# since we cannot import the actual Tensor type, we will create a minimal mock
# that is compatible with the tensor_to_value implementation.


class MockDescriptor:
    def __init__(self, name):
        self.name = name

class MockTensor:
    """
    A mock Tensor object that mimics the minimal interface needed for tensor_to_value.
    """
    def __init__(self, field_name=None, value=None):
        self._field_name = field_name
        self._value = value

    def ListFields(self):
        # Returns a list of (descriptor, value) tuples as expected by tensor_to_value
        if self._field_name is None:
            return []
        return [(MockDescriptor(self._field_name), self._value)]
from aiplatform._streaming_prediction import tensor_to_value

# unit tests

# 1. Basic Test Cases

def test_empty_tensor_returns_none():
    # Test that an empty tensor returns None
    tensor = MockTensor()
    codeflash_output = tensor_to_value(tensor) # 617ns -> 555ns (11.2% faster)

def test_single_int_value():
    # Test tensor with a single integer value
    tensor = MockTensor('int_val', [42])
    codeflash_output = tensor_to_value(tensor) # 2.92μs -> 2.56μs (14.0% faster)

def test_single_float_value():
    # Test tensor with a single float value
    tensor = MockTensor('float_val', [3.14])
    codeflash_output = tensor_to_value(tensor) # 2.67μs -> 2.31μs (15.9% faster)

def test_single_string_value():
    # Test tensor with a single string value
    tensor = MockTensor('string_val', ['hello'])
    codeflash_output = tensor_to_value(tensor) # 2.72μs -> 2.42μs (12.5% faster)

def test_multiple_int_values():
    # Test tensor with multiple integer values
    tensor = MockTensor('int_val', [1, 2, 3])
    codeflash_output = tensor_to_value(tensor) # 2.68μs -> 2.41μs (11.3% faster)

def test_multiple_float_values():
    # Test tensor with multiple float values
    tensor = MockTensor('float_val', [1.1, 2.2, 3.3])
    codeflash_output = tensor_to_value(tensor) # 2.63μs -> 2.36μs (11.4% faster)

def test_multiple_string_values():
    # Test tensor with multiple string values
    tensor = MockTensor('string_val', ['a', 'b', 'c'])
    codeflash_output = tensor_to_value(tensor) # 2.66μs -> 2.41μs (10.3% faster)

def test_list_val_of_ints():
    # Test tensor with list_val field containing tensors of ints
    tensor = MockTensor('list_val', [
        MockTensor('int_val', [1]),
        MockTensor('int_val', [2]),
        MockTensor('int_val', [3])
    ])
    codeflash_output = tensor_to_value(tensor) # 6.00μs -> 5.18μs (15.7% faster)

def test_list_val_of_strings():
    # Test tensor with list_val field containing tensors of strings
    tensor = MockTensor('list_val', [
        MockTensor('string_val', ['x']),
        MockTensor('string_val', ['y']),
        MockTensor('string_val', ['z'])
    ])
    codeflash_output = tensor_to_value(tensor) # 5.81μs -> 4.96μs (17.1% faster)

def test_struct_val_simple():
    # Test tensor with struct_val field containing simple key-value pairs
    tensor = MockTensor('struct_val', {
        'foo': MockTensor('int_val', [10]),
        'bar': MockTensor('string_val', ['baz'])
    })
    codeflash_output = tensor_to_value(tensor) # 5.27μs -> 4.62μs (14.1% faster)

def test_struct_val_nested():
    # Test tensor with nested struct_val
    tensor = MockTensor('struct_val', {
        'outer': MockTensor('struct_val', {
            'inner': MockTensor('int_val', [99])
        }),
        'other': MockTensor('string_val', ['hello'])
    })
    codeflash_output = tensor_to_value(tensor) # 6.24μs -> 5.28μs (18.2% faster)

# 2. Edge Test Cases

def test_struct_val_empty_dict():
    # Test struct_val with empty dictionary
    tensor = MockTensor('struct_val', {})
    codeflash_output = tensor_to_value(tensor) # 1.88μs -> 1.62μs (16.3% faster)

def test_list_val_empty_list():
    # Test list_val with empty list
    tensor = MockTensor('list_val', [])
    codeflash_output = tensor_to_value(tensor) # 1.70μs -> 1.35μs (25.1% faster)

def test_list_val_of_structs():
    # Test list_val containing struct_val tensors
    tensor = MockTensor('list_val', [
        MockTensor('struct_val', {'a': MockTensor('int_val', [1])}),
        MockTensor('struct_val', {'b': MockTensor('int_val', [2])}),
    ])
    codeflash_output = tensor_to_value(tensor) # 7.19μs -> 5.88μs (22.3% faster)

def test_struct_val_with_list_val():
    # Test struct_val containing list_val tensors
    tensor = MockTensor('struct_val', {
        'numbers': MockTensor('list_val', [
            MockTensor('int_val', [5]),
            MockTensor('int_val', [6])
        ]),
        'letters': MockTensor('list_val', [
            MockTensor('string_val', ['x']),
            MockTensor('string_val', ['y'])
        ])
    })
    codeflash_output = tensor_to_value(tensor) # 8.73μs -> 7.29μs (19.8% faster)

def test_non_sequence_value_raises_typeerror():
    # Test that a non-sequence value raises TypeError
    tensor = MockTensor('int_val', 123)  # not a list
    with pytest.raises(TypeError):
        tensor_to_value(tensor) # 3.22μs -> 3.08μs (4.51% faster)

def test_struct_val_with_non_tensor_value():
    # Test struct_val with non-Tensor values (should raise TypeError)
    tensor = MockTensor('struct_val', {
        'bad': 100
    })
    with pytest.raises(AttributeError):
        tensor_to_value(tensor) # 2.88μs -> 2.61μs (10.3% faster)

def test_list_val_with_non_tensor_value():
    # Test list_val with non-Tensor values (should raise AttributeError)
    tensor = MockTensor('list_val', [42, 43])
    with pytest.raises(AttributeError):
        tensor_to_value(tensor) # 2.51μs -> 2.25μs (11.3% faster)

def test_struct_val_with_none_value():
    # Test struct_val with None value
    tensor = MockTensor('struct_val', {'x': None})
    with pytest.raises(AttributeError):
        tensor_to_value(tensor) # 2.62μs -> 2.46μs (6.30% faster)

def test_tensor_with_unexpected_field_name():
    # Test tensor with an unexpected field name (should raise TypeError)
    tensor = MockTensor('unknown_field', [1, 2])
    # Should treat as sequence, so returns [1, 2]
    codeflash_output = tensor_to_value(tensor) # 3.20μs -> 2.93μs (9.47% faster)

def test_tensor_with_unexpected_non_sequence_field():
    # Test tensor with an unexpected field name and non-sequence value
    tensor = MockTensor('unknown_field', 999)
    with pytest.raises(TypeError):
        tensor_to_value(tensor) # 3.35μs -> 3.22μs (4.00% faster)

def test_tensor_with_multiple_fields_returns_first():
    # Test tensor with multiple fields (simulate by overriding ListFields)
    class MultiFieldTensor(MockTensor):
        def ListFields(self):
            return [
                (MockDescriptor('int_val'), [1]),
                (MockDescriptor('float_val'), [2.0])
            ]
    tensor = MultiFieldTensor()
    # Should only process the first field
    codeflash_output = tensor_to_value(tensor) # 3.74μs -> 3.28μs (14.0% faster)

# 3. Large Scale Test Cases

def test_large_list_val_of_ints():
    # Test tensor with a large list_val of int tensors (1000 elements)
    tensor = MockTensor('list_val', [
        MockTensor('int_val', [i]) for i in range(1000)
    ])
    codeflash_output = tensor_to_value(tensor); result = codeflash_output # 791μs -> 605μs (30.8% faster)

def test_large_struct_val():
    # Test tensor with a large struct_val (1000 key-value pairs)
    tensor = MockTensor('struct_val', {
        f'key{i}': MockTensor('int_val', [i]) for i in range(1000)
    })
    codeflash_output = tensor_to_value(tensor); result = codeflash_output # 842μs -> 650μs (29.5% faster)

def test_large_nested_struct_and_list():
    # Test tensor with large nested struct_val and list_val
    tensor = MockTensor('struct_val', {
        'ints': MockTensor('list_val', [
            MockTensor('int_val', [i]) for i in range(500)
        ]),
        'floats': MockTensor('list_val', [
            MockTensor('float_val', [float(i)]) for i in range(500)
        ])
    })
    codeflash_output = tensor_to_value(tensor); result = codeflash_output # 803μs -> 607μs (32.3% faster)

def test_large_list_val_of_structs():
    # Test tensor with a large list_val of struct_val tensors (1000 elements)
    tensor = MockTensor('list_val', [
        MockTensor('struct_val', {'num': MockTensor('int_val', [i])})
        for i in range(1000)
    ])
    codeflash_output = tensor_to_value(tensor); result = codeflash_output # 1.59ms -> 1.21ms (31.6% faster)

def test_deeply_nested_struct_and_list():
    # Test tensor with deeply nested struct_val and list_val (depth 5)
    def make_nested(depth):
        if depth == 0:
            return MockTensor('int_val', [depth])
        else:
            return MockTensor('struct_val', {
                f'level{depth}': MockTensor('list_val', [
                    make_nested(depth - 1),
                    make_nested(depth - 1)
                ])
            })
    tensor = make_nested(5)
    codeflash_output = tensor_to_value(tensor); result = codeflash_output # 71.1μs -> 54.1μs (31.5% faster)

def test_large_tensor_memory_limit():
    # Test tensor with a large list_val, but not exceeding 100MB
    # Each int is 28 bytes in Python, so 100MB/28 ~ 3.5M ints, but we keep it at 1000 for safety
    tensor = MockTensor('list_val', [
        MockTensor('int_val', [i]) for i in range(1000)
    ])
    codeflash_output = tensor_to_value(tensor); result = codeflash_output # 797μs -> 605μs (31.9% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from typing import Any, Sequence

# imports
import pytest  # used for our unit tests
from aiplatform._streaming_prediction import tensor_to_value

# function to test
# -*- coding: utf-8 -*-

# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#


# --- Mock Tensor class for testing ---
class MockDescriptor:
    def __init__(self, name):
        self.name = name

class MockTensor:
    """A minimal mock of aiplatform_types.Tensor for unit testing."""
    def __init__(self, field_name=None, value=None):
        # field_name: str, one of "list_val", "struct_val", or a scalar type
        # value: the value to store
        self._field_name = field_name
        self._value = value

    def ListFields(self):
        # Returns a list of (descriptor, value) tuples
        if self._field_name is None:
            return []
        return [(MockDescriptor(self._field_name), self._value)]
from aiplatform._streaming_prediction import tensor_to_value

# --- Unit Tests ---

# ----------- BASIC TEST CASES -----------

def test_empty_tensor_returns_none():
    """Test that an empty tensor returns None."""
    t = MockTensor()
    codeflash_output = tensor_to_value(t) # 657ns -> 572ns (14.9% faster)

def test_single_scalar_int():
    """Test a tensor with a single integer value."""
    t = MockTensor(field_name="int_val", value=[42])
    codeflash_output = tensor_to_value(t) # 3.07μs -> 2.64μs (16.4% faster)

def test_single_scalar_float():
    """Test a tensor with a single float value."""
    t = MockTensor(field_name="float_val", value=[3.14])
    codeflash_output = tensor_to_value(t) # 2.80μs -> 2.37μs (18.3% faster)

def test_single_scalar_str():
    """Test a tensor with a single string value."""
    t = MockTensor(field_name="string_val", value=["hello"])
    codeflash_output = tensor_to_value(t) # 2.74μs -> 2.42μs (13.3% faster)

def test_multiple_scalar_ints():
    """Test a tensor with multiple integer values."""
    t = MockTensor(field_name="int_val", value=[1, 2, 3])
    codeflash_output = tensor_to_value(t) # 2.76μs -> 2.43μs (13.6% faster)

def test_multiple_scalar_floats():
    """Test a tensor with multiple float values."""
    t = MockTensor(field_name="float_val", value=[1.1, 2.2, 3.3])
    codeflash_output = tensor_to_value(t) # 2.66μs -> 2.40μs (10.8% faster)

def test_multiple_scalar_strs():
    """Test a tensor with multiple string values."""
    t = MockTensor(field_name="string_val", value=["a", "b", "c"])
    codeflash_output = tensor_to_value(t) # 2.72μs -> 2.34μs (16.4% faster)

def test_list_val_with_scalars():
    """Test a tensor with list_val containing scalar tensors."""
    t = MockTensor(
        field_name="list_val",
        value=[
            MockTensor(field_name="int_val", value=[1]),
            MockTensor(field_name="int_val", value=[2]),
            MockTensor(field_name="int_val", value=[3])
        ]
    )
    codeflash_output = tensor_to_value(t) # 6.03μs -> 5.21μs (15.8% faster)

def test_struct_val_with_scalars():
    """Test a tensor with struct_val containing scalar tensors."""
    t = MockTensor(
        field_name="struct_val",
        value={
            "x": MockTensor(field_name="float_val", value=[1.5]),
            "y": MockTensor(field_name="int_val", value=[2])
        }
    )
    codeflash_output = tensor_to_value(t) # 5.11μs -> 4.61μs (10.8% faster)

def test_nested_list_val():
    """Test a tensor with nested list_val."""
    t = MockTensor(
        field_name="list_val",
        value=[
            MockTensor(field_name="list_val", value=[
                MockTensor(field_name="int_val", value=[1]),
                MockTensor(field_name="int_val", value=[2])
            ]),
            MockTensor(field_name="list_val", value=[
                MockTensor(field_name="int_val", value=[3]),
                MockTensor(field_name="int_val", value=[4])
            ])
        ]
    )
    codeflash_output = tensor_to_value(t) # 8.36μs -> 6.67μs (25.4% faster)

def test_nested_struct_val():
    """Test a tensor with nested struct_val."""
    t = MockTensor(
        field_name="struct_val",
        value={
            "a": MockTensor(field_name="struct_val", value={
                "x": MockTensor(field_name="int_val", value=[10]),
                "y": MockTensor(field_name="int_val", value=[20])
            }),
            "b": MockTensor(field_name="struct_val", value={
                "z": MockTensor(field_name="float_val", value=[3.14])
            })
        }
    )
    codeflash_output = tensor_to_value(t) # 7.63μs -> 6.54μs (16.6% faster)

def test_struct_val_with_list_val():
    """Test a tensor with struct_val containing list_val."""
    t = MockTensor(
        field_name="struct_val",
        value={
            "values": MockTensor(field_name="list_val", value=[
                MockTensor(field_name="int_val", value=[1]),
                MockTensor(field_name="int_val", value=[2])
            ])
        }
    )
    codeflash_output = tensor_to_value(t) # 5.69μs -> 4.91μs (15.8% faster)

# ----------- EDGE TEST CASES -----------

def test_struct_val_empty_dict():
    """Test struct_val with an empty dictionary."""
    t = MockTensor(field_name="struct_val", value={})
    codeflash_output = tensor_to_value(t) # 1.86μs -> 1.59μs (17.6% faster)

def test_list_val_empty_list():
    """Test list_val with an empty list."""
    t = MockTensor(field_name="list_val", value=[])
    codeflash_output = tensor_to_value(t) # 1.67μs -> 1.36μs (23.1% faster)

def test_struct_val_with_none_value():
    """Test struct_val with a None value as a field."""
    t = MockTensor(
        field_name="struct_val",
        value={
            "missing": MockTensor()
        }
    )
    codeflash_output = tensor_to_value(t) # 2.19μs -> 1.90μs (15.7% faster)

def test_list_val_with_none_tensor():
    """Test list_val containing a tensor with no fields."""
    t = MockTensor(
        field_name="list_val",
        value=[
            MockTensor(field_name="int_val", value=[1]),
            MockTensor()
        ]
    )
    codeflash_output = tensor_to_value(t) # 4.34μs -> 3.79μs (14.5% faster)

def test_struct_val_with_list_and_struct():
    """Test struct_val containing both list_val and struct_val fields."""
    t = MockTensor(
        field_name="struct_val",
        value={
            "list": MockTensor(field_name="list_val", value=[
                MockTensor(field_name="int_val", value=[1]),
                MockTensor(field_name="int_val", value=[2])
            ]),
            "struct": MockTensor(field_name="struct_val", value={
                "a": MockTensor(field_name="string_val", value=["foo"])
            })
        }
    )
    codeflash_output = tensor_to_value(t) # 8.48μs -> 7.08μs (19.8% faster)

def test_non_sequence_value_raises():
    """Test that a non-sequence value raises TypeError."""
    t = MockTensor(field_name="int_val", value=42)  # Not a list
    with pytest.raises(TypeError):
        tensor_to_value(t) # 3.22μs -> 3.00μs (7.22% faster)

def test_sequence_with_zero_length():
    """Test a field with a zero-length sequence."""
    t = MockTensor(field_name="int_val", value=[])
    codeflash_output = tensor_to_value(t) # 2.76μs -> 2.45μs (12.9% faster)

def test_sequence_with_multiple_types():
    """Test a list_val containing tensors of different scalar types."""
    t = MockTensor(
        field_name="list_val",
        value=[
            MockTensor(field_name="int_val", value=[1]),
            MockTensor(field_name="float_val", value=[2.5]),
            MockTensor(field_name="string_val", value=["abc"])
        ]
    )
    codeflash_output = tensor_to_value(t) # 5.99μs -> 5.10μs (17.5% faster)

def test_struct_val_with_non_tensor_value():
    """Test struct_val with a non-tensor value (should raise TypeError)."""
    t = MockTensor(
        field_name="struct_val",
        value={
            "x": 123  # Not a MockTensor
        }
    )
    with pytest.raises(AttributeError):
        tensor_to_value(t) # 2.80μs -> 2.64μs (6.02% faster)

# ----------- LARGE SCALE TEST CASES -----------

def test_large_list_val():
    """Test a list_val with a large number of scalar tensors."""
    n = 1000
    t = MockTensor(
        field_name="list_val",
        value=[MockTensor(field_name="int_val", value=[i]) for i in range(n)]
    )
    codeflash_output = tensor_to_value(t); result = codeflash_output # 808μs -> 613μs (31.8% faster)

def test_large_struct_val():
    """Test a struct_val with a large number of fields."""
    n = 1000
    t = MockTensor(
        field_name="struct_val",
        value={str(i): MockTensor(field_name="int_val", value=[i]) for i in range(n)}
    )
    codeflash_output = tensor_to_value(t); result = codeflash_output # 843μs -> 654μs (29.0% faster)
    for i in range(n):
        pass

def test_large_nested_list_struct():
    """Test large nested list_val and struct_val."""
    n = 100
    t = MockTensor(
        field_name="list_val",
        value=[
            MockTensor(field_name="struct_val", value={
                "x": MockTensor(field_name="int_val", value=[i]),
                "y": MockTensor(field_name="float_val", value=[float(i)])
            }) for i in range(n)
        ]
    )
    codeflash_output = tensor_to_value(t); result = codeflash_output # 241μs -> 182μs (32.5% faster)
    for i in range(n):
        pass

def test_large_multi_dimensional_list():
    """Test a large multi-dimensional list_val."""
    n = 50
    m = 20
    t = MockTensor(
        field_name="list_val",
        value=[
            MockTensor(
                field_name="list_val",
                value=[
                    MockTensor(field_name="int_val", value=[i * m + j])
                    for j in range(m)
                ]
            )
            for i in range(n)
        ]
    )
    codeflash_output = tensor_to_value(t); result = codeflash_output # 846μs -> 647μs (30.7% faster)
    for i in range(n):
        pass

def test_large_struct_with_lists():
    """Test a struct_val with many list_val fields."""
    n = 100
    m = 10
    t = MockTensor(
        field_name="struct_val",
        value={
            f"list_{i}": MockTensor(
                field_name="list_val",
                value=[MockTensor(field_name="int_val", value=[j]) for j in range(m)]
            )
            for i in range(n)
        }
    )
    codeflash_output = tensor_to_value(t); result = codeflash_output # 887μs -> 677μs (30.9% faster)
    for i in range(n):
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-tensor_to_value-mgkk3x1u and push.

Codeflash

The optimized version achieves a 30% speedup by eliminating redundant `ListFields()` calls and streamlining the execution path:

**Key optimizations:**

1. **Eliminated duplicate `ListFields()` call**: The original code called `tensor_pb.ListFields()` twice - once to get the list and again to extract `descriptor, value`. The optimized version stores the result in `list_of_fields` and reuses it with `list_of_fields[0]`, reducing expensive protobuf method calls.

2. **Cached descriptor name**: Instead of accessing `descriptor.name` multiple times in conditional checks, the optimized version stores it as `name = descriptor.name` once, reducing attribute access overhead.

3. **Streamlined final return logic**: The original version had separate `if len(value) == 1` and `else` blocks with explicit returns. The optimized version uses a single ternary expression `return value[0] if len(value) == 1 else value`, eliminating branching overhead.

**Why it's faster:**
- `ListFields()` is a relatively expensive protobuf operation that involves introspection of the message structure. Eliminating one call per function invocation directly reduces computation.
- Attribute access (`descriptor.name`) has lookup overhead in Python. Caching it in a local variable provides faster access.
- The ternary expression is more CPU-efficient than separate conditional branches for simple cases.

**Performance characteristics:**
The optimizations show consistent 10-32% improvements across all test cases, with particularly strong gains (30%+) on large-scale scenarios involving 1000+ elements. The benefits compound in nested structures where `tensor_to_value` is called recursively many times.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 10, 2025 08:00
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants