Skip to content

Conversation

codeflash-ai[bot]
Copy link

@codeflash-ai codeflash-ai bot commented Oct 8, 2025

📄 2,940% (29.40x) speedup for PipelineRuntimeConfigBuilder.build in google/cloud/aiplatform/utils/pipeline_utils.py

⏱️ Runtime : 18.1 milliseconds 595 microseconds (best of 391 runs)

📝 Explanation and details

The optimized code achieves a 2939% speedup by pre-computing version comparisons in the constructor instead of repeating expensive packaging.version.parse() calls during execution.

Key optimizations:

  1. Version parsing moved to __init__: The original code called packaging.version.parse(self._schema_version) and packaging.version.parse("2.0.0") on every build() and _get_vertex_value() call. The optimized version parses these once during initialization and stores the results as self._parsed_schema_version and self._is_version_gt_2.

  2. Boolean flag for version comparison: Instead of repeating the expensive version comparison > packaging.version.parse("2.0.0"), the code uses the pre-computed boolean self._is_version_gt_2.

Why this is faster:

  • packaging.version.parse() involves string parsing and object creation, which is computationally expensive
  • In the original code, these parsing operations dominated execution time (95.4% in _get_vertex_value and significant time in build())
  • The optimization eliminates redundant parsing - especially impactful when _get_vertex_value() is called multiple times per build() invocation

Test case performance:

  • Large-scale scenarios benefit most: Tests with 500+ parameters show 2700-5500% speedups because _get_vertex_value() is called repeatedly
  • Basic scenarios: Even simple cases show 700-1200% improvements due to eliminating version parsing overhead
  • Edge cases: Error scenarios still improve 400-600% as version parsing happens before validation logic

The optimization is particularly effective for pipelines with many parameters, where the version comparison cost scales linearly with parameter count.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 94 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import copy
from typing import Any, Dict, Mapping, Optional, Union

import packaging.version
# imports
import pytest  # used for our unit tests
from aiplatform.utils.pipeline_utils import PipelineRuntimeConfigBuilder


# Dummy pipeline_failure_policy for test purposes
class DummyPipelineFailurePolicy:
    def __init__(self, policy_name):
        self.policy_name = policy_name

    def __eq__(self, other):
        return isinstance(other, DummyPipelineFailurePolicy) and self.policy_name == other.policy_name

    def __repr__(self):
        return f"DummyPipelineFailurePolicy({self.policy_name!r})"

# unit tests

# ------------------------
# Basic Test Cases
# ------------------------

def test_basic_int_parameter_v1():
    """Test basic INT parameter for schema_version <= 2.0.0"""
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://my-bucket/pipeline",
        schema_version="2.0.0",
        parameter_types={"param1": "INT"},
        parameter_values={"param1": 42}
    )
    codeflash_output = builder.build(); config = codeflash_output # 21.9μs -> 2.11μs (934% faster)

def test_basic_string_parameter_v1():
    """Test basic STRING parameter for schema_version <= 2.0.0"""
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="1.5.0",
        parameter_types={"p": "STRING"},
        parameter_values={"p": "hello"}
    )
    codeflash_output = builder.build(); config = codeflash_output # 22.5μs -> 2.34μs (860% faster)

def test_basic_double_parameter_v1():
    """Test DOUBLE parameter for schema_version <= 2.0.0"""
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="2.0.0",
        parameter_types={"d": "DOUBLE"},
        parameter_values={"d": 3.14}
    )
    codeflash_output = builder.build(); config = codeflash_output # 21.9μs -> 2.19μs (901% faster)

def test_basic_parameters_v2():
    """Test parameters for schema_version > 2.0.0 (no wrapping)"""
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="2.1.0",
        parameter_types={"x": "STRING", "y": "INT"},
        parameter_values={"x": "abc", "y": 7}
    )
    codeflash_output = builder.build(); config = codeflash_output # 28.2μs -> 2.13μs (1222% faster)

def test_basic_input_artifacts():
    """Test input_artifacts are mapped correctly"""
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="2.1.0",
        parameter_types={},
        input_artifacts={"art1": "id1", "art2": "id2"}
    )
    codeflash_output = builder.build(); config = codeflash_output # 14.6μs -> 1.83μs (698% faster)

def test_basic_default_runtime():
    """Test default_runtime is included if provided"""
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="2.1.0",
        parameter_types={},
        default_runtime={"foo": "bar"}
    )
    codeflash_output = builder.build(); config = codeflash_output # 14.2μs -> 1.65μs (761% faster)

def test_basic_failure_policy():
    """Test failure_policy is included if provided"""
    fp = DummyPipelineFailurePolicy("FAIL_FAST")
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="2.1.0",
        parameter_types={},
        failure_policy=fp
    )
    codeflash_output = builder.build(); config = codeflash_output # 14.2μs -> 1.62μs (778% faster)

def test_basic_empty_parameter_values_and_artifacts():
    """Test empty parameter_values and input_artifacts"""
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="2.1.0",
        parameter_types={}
    )
    codeflash_output = builder.build(); config = codeflash_output # 14.2μs -> 1.46μs (875% faster)

# ------------------------
# Edge Test Cases
# ------------------------

def test_missing_pipeline_root_raises():
    """Test ValueError is raised if pipeline_root is missing/empty"""
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="",
        schema_version="2.1.0",
        parameter_types={}
    )
    with pytest.raises(ValueError, match="Pipeline root must be specified"):
        builder.build() # 953ns -> 815ns (16.9% faster)

def test_parameter_value_none_filtered():
    """Test that None parameter values are filtered out (not present in config)"""
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="2.1.0",
        parameter_types={"a": "INT", "b": "STRING"},
        parameter_values={"a": None, "b": "value"}
    )
    codeflash_output = builder.build(); config = codeflash_output # 23.7μs -> 2.16μs (999% faster)

def test_parameter_name_not_in_types_raises():
    """Test that unknown parameter name raises ValueError"""
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="2.0.0",
        parameter_types={"x": "INT"},
        parameter_values={"y": 5}
    )
    with pytest.raises(ValueError, match="pipeline parameter y is not found"):
        builder.build() # 15.5μs -> 2.68μs (478% faster)

def test_parameter_type_unknown_raises():
    """Test that unknown parameter type raises TypeError"""
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="2.0.0",
        parameter_types={"x": "BOOL"},  # unsupported type
        parameter_values={"x": True}
    )
    with pytest.raises(TypeError, match="Got unknown type of value"):
        builder.build() # 24.1μs -> 3.51μs (587% faster)

def test_schema_version_boundary():
    """Test schema_version at the boundary (exactly 2.0.0) uses 'parameters'"""
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="2.0.0",
        parameter_types={"x": "STRING"},
        parameter_values={"x": "test"}
    )
    codeflash_output = builder.build(); config = codeflash_output # 23.0μs -> 2.56μs (800% faster)

def test_schema_version_above_boundary():
    """Test schema_version just above boundary uses 'parameterValues'"""
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="2.0.1",
        parameter_types={"x": "STRING"},
        parameter_values={"x": "test"}
    )
    codeflash_output = builder.build(); config = codeflash_output # 22.3μs -> 2.10μs (959% faster)

def test_parameter_values_with_list_and_dict_v2():
    """Test that list/dict parameter values work for schema_version > 2.0.0"""
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="2.1.0",
        parameter_types={"l": "STRING", "d": "STRING"},
        parameter_values={"l": [1,2,3], "d": {"a":1}}
    )
    codeflash_output = builder.build(); config = codeflash_output # 28.4μs -> 2.17μs (1207% faster)

def test_input_artifacts_empty_dict():
    """Test input_artifacts as empty dict yields empty mapping"""
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="2.1.0",
        parameter_types={},
        input_artifacts={}
    )
    codeflash_output = builder.build(); config = codeflash_output # 14.4μs -> 1.57μs (814% faster)

def test_default_runtime_not_included_if_none():
    """Test default_runtime is not included if None"""
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="2.1.0",
        parameter_types={},
        default_runtime=None
    )
    codeflash_output = builder.build(); config = codeflash_output # 14.7μs -> 1.50μs (882% faster)

def test_failure_policy_not_included_if_none():
    """Test failure_policy is not included if None"""
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="2.1.0",
        parameter_types={},
        failure_policy=None
    )
    codeflash_output = builder.build(); config = codeflash_output # 14.4μs -> 1.49μs (866% faster)

def test_parameter_values_with_zero_and_empty_string():
    """Test that zero and empty string are handled correctly (not filtered out)"""
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="2.1.0",
        parameter_types={"a": "INT", "b": "STRING"},
        parameter_values={"a": 0, "b": ""}
    )
    codeflash_output = builder.build(); config = codeflash_output # 28.8μs -> 2.23μs (1193% faster)

# ------------------------
# Large Scale Test Cases
# ------------------------

def test_large_number_of_parameters_v1():
    """Test builder with large number of parameters for schema_version <= 2.0.0"""
    N = 500
    parameter_types = {f"param{i}": "INT" for i in range(N)}
    parameter_values = {f"param{i}": i for i in range(N)}
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://large",
        schema_version="2.0.0",
        parameter_types=parameter_types,
        parameter_values=parameter_values
    )
    codeflash_output = builder.build(); config = codeflash_output # 3.25ms -> 115μs (2707% faster)
    for i in range(N):
        pass

def test_large_number_of_parameters_v2():
    """Test builder with large number of parameters for schema_version > 2.0.0"""
    N = 500
    parameter_types = {f"param{i}": "STRING" for i in range(N)}
    parameter_values = {f"param{i}": str(i) for i in range(N)}
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://large",
        schema_version="2.1.0",
        parameter_types=parameter_types,
        parameter_values=parameter_values
    )
    codeflash_output = builder.build(); config = codeflash_output # 3.14ms -> 56.2μs (5492% faster)
    for i in range(N):
        pass

def test_large_number_of_input_artifacts():
    """Test builder with large number of input_artifacts"""
    N = 500
    input_artifacts = {f"art{i}": f"id{i}" for i in range(N)}
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://large",
        schema_version="2.1.0",
        parameter_types={},
        input_artifacts=input_artifacts
    )
    codeflash_output = builder.build(); config = codeflash_output # 53.8μs -> 37.7μs (42.5% faster)
    for i in range(N):
        pass

def test_large_number_of_parameters_and_artifacts():
    """Test builder with large number of parameters and input_artifacts"""
    N = 300
    parameter_types = {f"p{i}": "STRING" for i in range(N)}
    parameter_values = {f"p{i}": f"val{i}" for i in range(N)}
    input_artifacts = {f"a{i}": f"id{i}" for i in range(N)}
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://large",
        schema_version="2.1.0",
        parameter_types=parameter_types,
        parameter_values=parameter_values,
        input_artifacts=input_artifacts
    )
    codeflash_output = builder.build(); config = codeflash_output # 1.92ms -> 56.0μs (3325% faster)
    for i in range(N):
        pass

def test_large_default_runtime_and_failure_policy():
    """Test builder with large default_runtime and failure_policy"""
    default_runtime = {f"key{i}": f"val{i}" for i in range(100)}
    fp = DummyPipelineFailurePolicy("FAIL_SLOW")
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://large",
        schema_version="2.1.0",
        parameter_types={},
        default_runtime=default_runtime,
        failure_policy=fp
    )
    codeflash_output = builder.build(); config = codeflash_output # 15.3μs -> 1.77μs (766% faster)

def test_large_parameter_values_with_mixed_types_v2():
    """Test builder with large parameterValues of mixed types for schema_version > 2.0.0"""
    N = 200
    parameter_types = {f"i{i}": "INT" for i in range(N)}
    parameter_types.update({f"s{i}": "STRING" for i in range(N)})
    parameter_values = {f"i{i}": i for i in range(N)}
    parameter_values.update({f"s{i}": f"str{i}" for i in range(N)})
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://large",
        schema_version="2.1.0",
        parameter_types=parameter_types,
        parameter_values=parameter_values
    )
    codeflash_output = builder.build(); config = codeflash_output # 2.52ms -> 46.8μs (5291% faster)
    for i in range(N):
        pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import copy
from typing import Any, Dict, Mapping, Optional, Union

import packaging.version
# imports
import pytest  # used for our unit tests
from aiplatform.utils.pipeline_utils import PipelineRuntimeConfigBuilder

# function to test
# -*- coding: utf-8 -*-
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#


# Minimal stub for pipeline_failure_policy, since we can't import google.cloud.aiplatform.compat.types
class PipelineFailurePolicy:
    PIPELINE_FAILURE_POLICY_FAIL_SLOW = "FAIL_SLOW"
    PIPELINE_FAILURE_POLICY_FAIL_FAST = "FAIL_FAST"

pipeline_failure_policy = type("pipeline_failure_policy", (), {"PipelineFailurePolicy": PipelineFailurePolicy})

# unit tests

# ------------------- Basic Test Cases -------------------

def test_basic_int_parameter_v1():
    # Basic test: INT parameter, v1 schema
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://my-pipeline-root",
        schema_version="1.0.0",
        parameter_types={"param1": "INT"},
        parameter_values={"param1": 42},
    )
    codeflash_output = builder.build(); config = codeflash_output # 24.8μs -> 2.46μs (904% faster)

def test_basic_double_and_string_parameters_v1():
    # Basic test: DOUBLE and STRING parameter, v1 schema
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="1.0.0",
        parameter_types={"p1": "DOUBLE", "p2": "STRING"},
        parameter_values={"p1": 3.14, "p2": "hello"},
        input_artifacts={"input1": "artifact123"},
    )
    codeflash_output = builder.build(); config = codeflash_output # 29.0μs -> 2.79μs (938% faster)

def test_basic_parameter_values_v2():
    # Basic test: v2 schema uses parameterValues key and direct values
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="2.1.0",
        parameter_types={"p1": "INT", "p2": "STRING"},
        parameter_values={"p1": 99, "p2": "world"},
    )
    codeflash_output = builder.build(); config = codeflash_output # 28.5μs -> 2.17μs (1209% faster)

def test_basic_no_parameter_values():
    # Basic test: No parameter values provided
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="1.0.0",
        parameter_types={},
        parameter_values={},
    )
    codeflash_output = builder.build(); config = codeflash_output # 14.4μs -> 1.52μs (850% faster)

def test_basic_with_default_runtime_and_failure_policy():
    # Basic test: Default runtime and failure policy present
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="2.1.0",
        parameter_types={},
        parameter_values={},
        default_runtime={"some": "config"},
        failure_policy=pipeline_failure_policy.PipelineFailurePolicy.PIPELINE_FAILURE_POLICY_FAIL_FAST,
    )
    codeflash_output = builder.build(); config = codeflash_output # 14.2μs -> 1.73μs (717% faster)

# ------------------- Edge Test Cases -------------------

def test_edge_missing_pipeline_root():
    # Edge: pipeline_root is empty should raise ValueError
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="",
        schema_version="1.0.0",
        parameter_types={},
        parameter_values={},
    )
    with pytest.raises(ValueError, match="Pipeline root must be specified"):
        builder.build() # 914ns -> 790ns (15.7% faster)

def test_edge_missing_parameter_type():
    # Edge: parameter name not in parameter_types should raise ValueError
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="1.0.0",
        parameter_types={"p1": "INT"},
        parameter_values={"p2": 5},  # p2 not defined in parameter_types
    )
    with pytest.raises(ValueError, match="pipeline parameter p2 is not found"):
        builder.build() # 17.5μs -> 2.84μs (514% faster)

def test_edge_none_parameter_value_filtered():
    # Edge: None parameter value should be filtered out and not appear in config
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="1.0.0",
        parameter_types={"p1": "INT", "p2": "STRING"},
        parameter_values={"p1": None, "p2": "abc"},
    )
    codeflash_output = builder.build(); config = codeflash_output # 23.2μs -> 2.63μs (781% faster)

def test_edge_unknown_parameter_type():
    # Edge: unknown parameter type should raise TypeError
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="1.0.0",
        parameter_types={"p1": "BOOL"},  # BOOL not supported
        parameter_values={"p1": True},
    )
    with pytest.raises(TypeError, match="Got unknown type of value"):
        builder.build() # 24.0μs -> 3.61μs (565% faster)

def test_edge_schema_version_boundary():
    # Edge: schema_version exactly 2.0.0 uses "parameters" key
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="2.0.0",
        parameter_types={"p1": "STRING"},
        parameter_values={"p1": "test"},
    )
    codeflash_output = builder.build(); config = codeflash_output # 22.7μs -> 2.52μs (802% faster)

def test_edge_schema_version_above_boundary():
    # Edge: schema_version just above 2.0.0 uses "parameterValues" key
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="2.0.1",
        parameter_types={"p1": "STRING"},
        parameter_values={"p1": "test"},
    )
    codeflash_output = builder.build(); config = codeflash_output # 21.8μs -> 2.08μs (949% faster)

def test_edge_input_artifacts_empty_and_nonempty():
    # Edge: input_artifacts empty dict and non-empty dict
    builder_empty = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="1.0.0",
        parameter_types={},
        input_artifacts={},
    )
    codeflash_output = builder_empty.build(); config_empty = codeflash_output # 14.1μs -> 1.55μs (805% faster)

    builder_nonempty = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="1.0.0",
        parameter_types={},
        input_artifacts={"foo": "bar"},
    )
    codeflash_output = builder_nonempty.build(); config_nonempty = codeflash_output # 8.87μs -> 1.19μs (648% faster)

def test_edge_parameter_values_with_list_and_dict_v2():
    # Edge: v2 schema allows list/dict parameter values
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="2.1.0",
        parameter_types={"p1": "STRING", "p2": "STRING"},
        parameter_values={"p1": [1,2,3], "p2": {"a": "b"}},
    )
    codeflash_output = builder.build(); config = codeflash_output # 28.4μs -> 2.17μs (1205% faster)

def test_edge_parameter_types_case_sensitive():
    # Edge: parameter_types are case-sensitive
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="1.0.0",
        parameter_types={"p1": "int"},  # lowercase 'int'
        parameter_values={"p1": 5},
    )
    with pytest.raises(TypeError, match="Got unknown type of value"):
        builder.build() # 22.3μs -> 2.95μs (658% faster)

# ------------------- Large Scale Test Cases -------------------

def test_large_scale_many_parameters_v1():
    # Large scale: many parameters, v1 schema
    n = 500
    parameter_types = {f"p{i}": "INT" for i in range(n)}
    parameter_values = {f"p{i}": i for i in range(n)}
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="1.0.0",
        parameter_types=parameter_types,
        parameter_values=parameter_values,
    )
    codeflash_output = builder.build(); config = codeflash_output # 3.25ms -> 109μs (2874% faster)
    for i in range(n):
        pass

def test_large_scale_many_parameters_v2():
    # Large scale: many parameters, v2 schema
    n = 500
    parameter_types = {f"p{i}": "STRING" for i in range(n)}
    parameter_values = {f"p{i}": str(i) for i in range(n)}
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="2.1.0",
        parameter_types=parameter_types,
        parameter_values=parameter_values,
    )
    codeflash_output = builder.build(); config = codeflash_output # 3.15ms -> 55.5μs (5572% faster)
    for i in range(n):
        pass

def test_large_scale_many_input_artifacts():
    # Large scale: many input_artifacts
    n = 500
    input_artifacts = {f"art{i}": f"id{i}" for i in range(n)}
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="2.1.0",
        parameter_types={},
        input_artifacts=input_artifacts,
    )
    codeflash_output = builder.build(); config = codeflash_output # 52.9μs -> 37.6μs (40.8% faster)
    for i in range(n):
        pass

def test_large_scale_parameter_values_with_large_list_v2():
    # Large scale: v2 schema, parameter value is a large list
    large_list = list(range(1000))
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="2.1.0",
        parameter_types={"p1": "STRING"},
        parameter_values={"p1": large_list},
    )
    codeflash_output = builder.build(); config = codeflash_output # 23.4μs -> 2.15μs (987% faster)

def test_large_scale_parameter_values_with_large_dict_v2():
    # Large scale: v2 schema, parameter value is a large dict
    large_dict = {str(i): i for i in range(1000)}
    builder = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="2.1.0",
        parameter_types={"p1": "STRING"},
        parameter_values={"p1": large_dict},
    )
    codeflash_output = builder.build(); config = codeflash_output # 23.3μs -> 2.15μs (983% faster)

# ------------------- Determinism Test Case -------------------

def test_determinism_same_input_same_output():
    # Determinism: Same input yields same output
    builder1 = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="2.1.0",
        parameter_types={"p": "STRING"},
        parameter_values={"p": "x"},
        input_artifacts={"foo": "bar"},
    )
    builder2 = PipelineRuntimeConfigBuilder(
        pipeline_root="gs://root",
        schema_version="2.1.0",
        parameter_types={"p": "STRING"},
        parameter_values={"p": "x"},
        input_artifacts={"foo": "bar"},
    )
    codeflash_output = builder1.build(); config1 = codeflash_output # 22.7μs -> 2.11μs (976% faster)
    codeflash_output = builder2.build(); config2 = codeflash_output # 14.2μs -> 976ns (1355% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-PipelineRuntimeConfigBuilder.build-mgik0ltq and push.

Codeflash

vertex-sdk-bot and others added 10 commits October 6, 2025 08:53
… in Vertex AI GenAI SDK evals

PiperOrigin-RevId: 815745954
…ion_item` methods to Vertex AI GenAI SDK evals

PiperOrigin-RevId: 815805880
--
e339795 by Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>:

feat:Auto-generated CL for //google/cloud/aiplatform:aiplatform_v1beta1_public_proto_gen

PiperOrigin-RevId: 813083326

Source-Link: googleapis/googleapis@3ecf1f0

Source-Link: googleapis/googleapis-gen@00da0dd
Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiMDBkYTBkZDIyZDAwZmNkMTExMGE0MGI5MzJiNzI0M2Q4ZmRlZmIzYyJ9

--
efdd99a by Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>:

🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

--
b9fd9fa by Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>:

feat: Auto-generated CL for //google/cloud/aiplatform:aiplatform_v1_public_proto_gen

PiperOrigin-RevId: 813096234

Source-Link: googleapis/googleapis@e78280f

Source-Link: googleapis/googleapis-gen@d83eeb4
Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiZDgzZWViNGIwNTU5YWQwYjMyNDMxNTJlYzhiOGMyMzEwYzIzYWVjNyJ9

--
e01fdae by Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>:

🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

--
5f2b229 by Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>:

feat: add DeploymentTier enum to DeployedIndex

PiperOrigin-RevId: 813384393

Source-Link: googleapis/googleapis@063f9e1

Source-Link: googleapis/googleapis-gen@1119646
Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiMTExOTY0NmY5ZTUxOTIyZmJjYThjM2E3YWU3NDNhNmM3OTEzMWMxZSJ9

--
b9d3464 by Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>:

🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

--
fabb82d by Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>:

feat: Add labels field for Predict API for Imagen use case (v1beta)

PiperOrigin-RevId: 815803050

Source-Link: googleapis/googleapis@7f0c1e5

Source-Link: googleapis/googleapis-gen@a452a5d
Copy-Tag: eyJwIjoiLmdpdGh1Yi8uT3dsQm90LnlhbWwiLCJoIjoiYTQ1MmE1ZGRhOGU3MmMzMmQ2MGIwMWU3NTFiZDBjYmRmNmIzYTI2ZSJ9

--
8cf83c2 by Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>:

🦉 Updates from OwlBot post-processor

See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md

COPYBARA_INTEGRATE_REVIEW=googleapis#5863 from googleapis:owl-bot-copy a962595
PiperOrigin-RevId: 815901905
PiperOrigin-RevId: 816325320
…th_events calls.

PiperOrigin-RevId: 816470728
Co-authored-by: release-please[bot] <55107282+release-please[bot]@users.noreply.github.com>
The optimized code achieves a 2939% speedup by **pre-computing version comparisons in the constructor** instead of repeating expensive `packaging.version.parse()` calls during execution.

**Key optimizations:**

1. **Version parsing moved to `__init__`**: The original code called `packaging.version.parse(self._schema_version)` and `packaging.version.parse("2.0.0")` on every `build()` and `_get_vertex_value()` call. The optimized version parses these once during initialization and stores the results as `self._parsed_schema_version` and `self._is_version_gt_2`.

2. **Boolean flag for version comparison**: Instead of repeating the expensive version comparison `> packaging.version.parse("2.0.0")`, the code uses the pre-computed boolean `self._is_version_gt_2`.

**Why this is faster:**
- `packaging.version.parse()` involves string parsing and object creation, which is computationally expensive
- In the original code, these parsing operations dominated execution time (95.4% in `_get_vertex_value` and significant time in `build()`)
- The optimization eliminates redundant parsing - especially impactful when `_get_vertex_value()` is called multiple times per `build()` invocation

**Test case performance:**
- **Large-scale scenarios benefit most**: Tests with 500+ parameters show 2700-5500% speedups because `_get_vertex_value()` is called repeatedly
- **Basic scenarios**: Even simple cases show 700-1200% improvements due to eliminating version parsing overhead
- **Edge cases**: Error scenarios still improve 400-600% as version parsing happens before validation logic

The optimization is particularly effective for pipelines with many parameters, where the version comparison cost scales linearly with parameter count.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 8, 2025 22:22
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants