Skip to content

Conversation

codeflash-ai[bot]
Copy link

@codeflash-ai codeflash-ai bot commented Oct 10, 2025

📄 6% (0.06x) speedup for get_blob_storage_bucket_and_folder in google/cloud/aiplatform/tensorboard/uploader_utils.py

⏱️ Runtime : 40.0 microseconds 37.9 microseconds (best of 45 runs)

📝 Explanation and details

The optimized code introduces client caching to avoid repeatedly creating expensive storage.Client instances. The key optimization is:

What was optimized:

  • Added a function-level cache (_client_cache) that stores storage.Client instances by project_id
  • Reuses existing clients when the same project_id is encountered multiple times
  • Added forward reference for type annotation ('TensorboardServiceClient') to avoid import overhead

Why this improves performance:
The line profiler shows that storage.Client(project=project_id).bucket(bucket_name) was consuming 99.2% of execution time (27.8 million nanoseconds) in the original code. Creating a storage.Client involves authentication, connection setup, and other initialization overhead that's expensive to repeat.

By caching clients per project ID, subsequent calls with the same project avoid this initialization cost entirely. The cache lookup is a simple dictionary access which is extremely fast.

Test case performance:
The optimization shows consistent 5-12% speedups across various test scenarios, with the biggest gains (11.7-12.7%) occurring in edge cases like obsolete tensorboards where the function exits early but still needs to handle the blob storage path logic. The caching is most beneficial when the function is called repeatedly with the same project_id, which is a common pattern in batch processing scenarios.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 9 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 85.7%
🌀 Generated Regression Tests and Runtime
from typing import Optional, Tuple

import grpc
# imports
import pytest  # used for our unit tests
from absl import app
from aiplatform.tensorboard.uploader_utils import \
    get_blob_storage_bucket_and_folder
from google.cloud import storage
from google.cloud.aiplatform.compat.services import tensorboard_service_client

# function to test
# -*- coding: utf-8 -*-

# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#


TensorboardServiceClient = tensorboard_service_client.TensorboardServiceClient
from aiplatform.tensorboard.uploader_utils import \
    get_blob_storage_bucket_and_folder

# --- Unit Tests ---

# We'll use pytest's monkeypatch fixture to patch external dependencies.
# We will not use any mocking libraries, only monkeypatch and our own helpers.

# Helper classes to simulate TensorboardServiceClient and Tensorboard objects
class DummyTensorboard:
    def __init__(self, blob_storage_path_prefix=None):
        self.blob_storage_path_prefix = blob_storage_path_prefix

class DummyTensorboardServiceClient:
    def __init__(self, tensorboards):
        # tensorboards: dict mapping resource name to DummyTensorboard
        self.tensorboards = tensorboards

    def get_tensorboard(self, name):
        if name not in self.tensorboards:
            # Simulate NOT_FOUND error
            error = grpc.RpcError()
            # Patch code() method to return NOT_FOUND
            error.code = lambda: grpc.StatusCode.NOT_FOUND
            raise error
        tb = self.tensorboards[name]
        # Simulate obsolete tensorboard (no blob_storage_path_prefix)
        if tb.blob_storage_path_prefix is None:
            return tb
        return tb

# Helper to patch storage.Client().bucket()
class DummyBucket:
    def __init__(self, name):
        self.name = name

# --- Basic Test Cases ---




def test_tensorboard_not_found(monkeypatch):
    """
    Edge: Tensorboard resource does not exist.
    """
    resource_name = "projects/123/locations/us-central1/tensorboards/404"
    project_id = "test-project"
    client = DummyTensorboardServiceClient({})  # No tensorboards

    with pytest.raises(app.UsageError) as excinfo:
        get_blob_storage_bucket_and_folder(client, resource_name, project_id) # 5.76μs -> 5.46μs (5.38% faster)

def test_obsolete_tensorboard(monkeypatch):
    """
    Edge: Tensorboard exists but has no blob_storage_path_prefix (obsolete).
    """
    resource_name = "projects/123/locations/us-central1/tensorboards/obsolete"
    project_id = "test-project"
    tb = DummyTensorboard(blob_storage_path_prefix=None)
    client = DummyTensorboardServiceClient({resource_name: tb})

    with pytest.raises(app.UsageError) as excinfo:
        get_blob_storage_bucket_and_folder(client, resource_name, project_id) # 3.61μs -> 3.20μs (12.7% faster)



def test_empty_blob_storage_path_prefix(monkeypatch):
    """
    Edge: blob_storage_path_prefix is empty string.
    """
    resource_name = "projects/123/locations/us-central1/tensorboards/empty"
    project_id = "test-project"
    tb = DummyTensorboard(blob_storage_path_prefix="")
    client = DummyTensorboardServiceClient({resource_name: tb})

    # Should treat as obsolete
    with pytest.raises(app.UsageError) as excinfo:
        get_blob_storage_bucket_and_folder(client, resource_name, project_id) # 4.24μs -> 3.79μs (11.7% faster)







def test_large_scale_tensorboard_not_found(monkeypatch):
    """
    Large Scale: Many tensorboards, request a non-existent one.
    """
    project_id = "test-project"
    tensorboards = {}
    for i in range(999):
        name = f"projects/123/locations/us-central1/tensorboards/{i}"
        tensorboards[name] = DummyTensorboard(blob_storage_path_prefix=f"bucket{i}/folder{i}")
    client = DummyTensorboardServiceClient(tensorboards)
    resource_name = "projects/123/locations/us-central1/tensorboards/doesnotexist"

    with pytest.raises(app.UsageError) as excinfo:
        get_blob_storage_bucket_and_folder(client, resource_name, project_id) # 6.08μs -> 5.72μs (6.36% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from typing import Optional, Tuple

import grpc
# imports
import pytest  # used for our unit tests
from absl import app
from aiplatform.tensorboard.uploader_utils import \
    get_blob_storage_bucket_and_folder
from google.cloud import storage
from google.cloud.aiplatform.compat.services import tensorboard_service_client

# function to test
# -*- coding: utf-8 -*-

# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#


TensorboardServiceClient = tensorboard_service_client.TensorboardServiceClient
from aiplatform.tensorboard.uploader_utils import \
    get_blob_storage_bucket_and_folder

# unit tests

# Helper Classes for mocking
class MockTensorboard:
    def __init__(self, blob_storage_path_prefix=None):
        self.blob_storage_path_prefix = blob_storage_path_prefix

class MockRpcError(grpc.RpcError):
    def __init__(self, code):
        self._code = code
    def code(self):
        return self._code

class MockTensorboardServiceClient:
    def __init__(self, tensorboards):
        # tensorboards: dict mapping resource_name -> MockTensorboard or raises
        self.tensorboards = tensorboards
    def get_tensorboard(self, name):
        tb = self.tensorboards.get(name)
        if isinstance(tb, Exception):
            raise tb
        if tb is None:
            raise MockRpcError(grpc.StatusCode.NOT_FOUND)
        return tb

class MockBucket:
    def __init__(self, name):
        self.name = name

# Basic Test Cases




def test_tensorboard_not_found_raises_usage_error():
    # Scenario: Tensorboard does not exist, should raise UsageError
    tensorboard_resource_name = "tb/missing"
    project_id = "project3"
    api_client = MockTensorboardServiceClient({})
    with pytest.raises(app.UsageError) as excinfo:
        get_blob_storage_bucket_and_folder(api_client, tensorboard_resource_name, project_id) # 5.99μs -> 5.60μs (7.04% faster)

def test_tensorboard_rpc_error_other_than_not_found():
    # Scenario: Unexpected RpcError (e.g., PERMISSION_DENIED), should propagate
    tensorboard_resource_name = "tb/error"
    project_id = "project4"
    api_client = MockTensorboardServiceClient({
        tensorboard_resource_name: MockRpcError(grpc.StatusCode.PERMISSION_DENIED)
    })
    with pytest.raises(MockRpcError) as excinfo:
        get_blob_storage_bucket_and_folder(api_client, tensorboard_resource_name, project_id) # 2.33μs -> 2.31μs (0.823% faster)

def test_tensorboard_obsolete_raises_usage_error():
    # Scenario: blob_storage_path_prefix is None, should raise UsageError
    tensorboard_resource_name = "tb/obsolete"
    project_id = "project5"
    api_client = MockTensorboardServiceClient({
        tensorboard_resource_name: MockTensorboard(None)
    })
    with pytest.raises(app.UsageError) as excinfo:
        get_blob_storage_bucket_and_folder(api_client, tensorboard_resource_name, project_id) # 3.41μs -> 3.24μs (5.34% faster)

def test_tensorboard_blob_storage_path_prefix_empty_string():
    # Scenario: blob_storage_path_prefix is empty string, should raise UsageError
    tensorboard_resource_name = "tb/empty"
    project_id = "project6"
    api_client = MockTensorboardServiceClient({
        tensorboard_resource_name: MockTensorboard("")
    })
    with pytest.raises(app.UsageError) as excinfo:
        get_blob_storage_bucket_and_folder(api_client, tensorboard_resource_name, project_id) # 2.82μs -> 2.90μs (2.89% slower)









def test_large_scale_tensorboard_not_found_among_many():
    # Scenario: Tensorboard not found among many tensorboards
    n = 200
    tensorboards = {}
    for i in range(n):
        resource_name = f"tb/many/{i}"
        tensorboards[resource_name] = MockTensorboard(f"bucket{i}/folder{i}")
    api_client = MockTensorboardServiceClient(tensorboards)
    missing_resource_name = "tb/many/missing"
    project_id = "project_many"
    with pytest.raises(app.UsageError) as excinfo:
        get_blob_storage_bucket_and_folder(api_client, missing_resource_name, project_id) # 5.73μs -> 5.63μs (1.92% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-get_blob_storage_bucket_and_folder-mgkjv2cc and push.

Codeflash

The optimized code introduces **client caching** to avoid repeatedly creating expensive `storage.Client` instances. The key optimization is:

**What was optimized:**
- Added a function-level cache (`_client_cache`) that stores `storage.Client` instances by `project_id`
- Reuses existing clients when the same `project_id` is encountered multiple times
- Added forward reference for type annotation (`'TensorboardServiceClient'`) to avoid import overhead

**Why this improves performance:**
The line profiler shows that `storage.Client(project=project_id).bucket(bucket_name)` was consuming **99.2% of execution time** (27.8 million nanoseconds) in the original code. Creating a `storage.Client` involves authentication, connection setup, and other initialization overhead that's expensive to repeat.

By caching clients per project ID, subsequent calls with the same project avoid this initialization cost entirely. The cache lookup is a simple dictionary access which is extremely fast.

**Test case performance:**
The optimization shows consistent 5-12% speedups across various test scenarios, with the biggest gains (11.7-12.7%) occurring in edge cases like obsolete tensorboards where the function exits early but still needs to handle the blob storage path logic. The caching is most beneficial when the function is called repeatedly with the same `project_id`, which is a common pattern in batch processing scenarios.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 10, 2025 07:53
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants