Skip to content

Conversation

lmossman
Copy link
Contributor

@lmossman lmossman commented Jul 28, 2025

This PR fixes a couple of issues which were uncovered upon the migration of the Builder UI to SchemaForm.

The reason these issues are now apparent is because the SchemaForm version of the Builder UI passes the whole manifest to the CDK for a test read, rather than just stripping it down to only the stream being tested, since it now relies on $refs between streams that the CDK must resolve when performing the test read.

This uncovered 2 issues that this PR fixes:

  1. The wrong primary_key and cursor_field values could be used, because the run_test_read function was naively grabbing them from the first stream, rather than finding the stream that matches to the stream name being tested and pulling them from that.
  2. Since the instantiation of declarative sources involves making API requests if the manifest contains dynamic streams with HttpComponentResolvers, there were extra HTTP requests being logged even if they dynamic stream wasn't the one being tested.
    • This screwed up message grouping logic since the HTTP page requests for the dynamic stream resolvers made the message grouper think that pages/slices were completed for the non-dynamic streams
    • The fix was to start setting the stream name for these logs, and make the message grouper ignore those HTTP page logs if they are for a different stream name

I also fixed the logging for async streams, so that the download request/response is now properly logged with the stream name included, so it is correctly shown in the Builder UI.

Summary by CodeRabbit

  • New Features

    • Improved support for reading from a specific data stream by name, rather than defaulting to the first available stream.
    • Enhanced filtering to ignore page HTTP requests from streams other than the one currently being read, resulting in more accurate message processing.
  • Bug Fixes

    • Prevented messages from unrelated streams from affecting data grouping and processing.
  • Refactor

    • Updated several internal methods to accept and utilize stream names for more precise operations.
    • Unified stream name usage in tests for consistency and clarity.

Copy link

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@lmossman/fix-builder-test-read#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch lmossman/fix-builder-test-read

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

Copy link

github-actions bot commented Jul 28, 2025

PyTest Results (Fast)

3 700 tests  ±0   3 689 ✅ ±0   6m 35s ⏱️ -9s
    1 suites ±0      11 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 833e689. ± Comparison against base commit 51cfea5.

♻️ This comment has been updated with latest results.

Copy link

github-actions bot commented Jul 28, 2025

PyTest Results (Full)

3 703 tests  ±0   3 692 ✅ ±0   18m 5s ⏱️ -6s
    1 suites ±0      11 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 833e689. ± Comparison against base commit 51cfea5.

♻️ This comment has been updated with latest results.

@lmossman lmossman changed the title Lmossman/fix builder test read fix: make builder test read work when a mix of static and dynamic streams are passed in Jul 29, 2025
@lmossman lmossman requested a review from brianjlai July 29, 2025 00:31
@lmossman lmossman marked this pull request as ready for review July 29, 2025 00:32
@Copilot Copilot AI review requested due to automatic review settings July 29, 2025 00:32
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes issues with the Builder test read functionality when mixing static and dynamic streams by ensuring correct stream identification and filtering out irrelevant HTTP requests from dynamic stream resolvers.

  • Fixed stream identification to find the correct stream by name instead of using the first stream
  • Updated HTTP component resolvers to include stream name context for proper request filtering
  • Added message filtering to ignore HTTP page requests from different streams during test reads

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py Updated HTTP components resolver to accept and use stream name parameter
airbyte_cdk/sources/declarative/manifest_declarative_source.py Pass stream name to HTTP components resolver during dynamic stream creation
airbyte_cdk/connector_builder/test_reader/reader.py Modified to find stream by name and handle null stream cases
airbyte_cdk/connector_builder/test_reader/message_grouper.py Added filtering to skip HTTP requests from different streams
airbyte_cdk/connector_builder/test_reader/helpers.py Added helper function to identify HTTP requests from different streams
airbyte_cdk/connector_builder/connector_builder_handler.py Updated function call to pass stream name parameter

Copy link
Contributor

coderabbitai bot commented Jul 29, 2025

📝 Walkthrough

Walkthrough

This change updates several internal APIs to ensure that stream-specific operations correctly use the stream name. The main adjustments include passing the stream_name parameter through multiple layers of the connector builder logic, updating function and method signatures, and filtering messages to only process those relevant to the specified stream.

Changes

Cohort / File(s) Change Summary
Connector Builder Handler Refactor
airbyte_cdk/connector_builder/connector_builder_handler.py
Updated the call to test_read_handler.run_test_read in read_stream to include the stream_name argument, aligning with downstream signature changes.
Test Reader Stream Filtering
airbyte_cdk/connector_builder/test_reader/helpers.py, airbyte_cdk/connector_builder/test_reader/message_grouper.py, airbyte_cdk/connector_builder/test_reader/reader.py
Added a helper to detect page HTTP requests for different streams; updated message grouping to skip irrelevant messages; changed run_test_read to accept and use stream_name for targeted stream selection and message grouping.
Declarative Source and Component Factory Update
airbyte_cdk/sources/declarative/manifest_declarative_source.py, airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
Passed stream_name through dynamic stream config creation and updated the component factory to accept and use stream_name when creating HTTP component resolvers.
Unit Tests Stream Name Integration
unit_tests/connector_builder/test_connector_builder_handler.py, unit_tests/connector_builder/test_message_grouper.py
Updated tests to include stream_name parameter in calls and mocks; replaced hardcoded stream name literals with variables to unify stream name usage across tests.

Sequence Diagram(s)

sequenceDiagram
    participant Handler as ConnectorBuilderHandler
    participant TestReader as TestReader
    participant Grouper as MessageGrouper
    participant Helpers as Helpers

    Handler->>TestReader: run_test_read(..., stream_name, ...)
    TestReader->>Grouper: get_message_groups(..., stream_name)
    loop For each message
        Grouper->>Helpers: is_page_http_request_for_different_stream(message, stream_name)
        alt If true
            Grouper-->>Grouper: Skip message
        else
            Grouper-->>Grouper: Process message
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~18 minutes

Suggested labels

enhancement, testing

Suggested reviewers

  • maxi297
  • brianjlai

Would you like to see additional test coverage for the new stream filtering logic, or do you feel the current updates are sufficient for now? Wdyt?

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7b04421 and 833e689.

📒 Files selected for processing (2)
  • airbyte_cdk/sources/declarative/manifest_declarative_source.py (1 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (3 hunks)
✅ Files skipped from review due to trivial changes (1)
  • airbyte_cdk/sources/declarative/manifest_declarative_source.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Pytest (Fast)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Analyze (python)
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch lmossman/fix-builder-test-read

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
airbyte_cdk/connector_builder/test_reader/message_grouper.py (2)

6-40: Fix import sorting to resolve linter error

The linter is flagging that the import block is unsorted. Would you mind running ruff --fix to organize the imports properly? wdyt?


48-49: Consider updating the docstring to document the new parameter

The function signature now includes stream_name: str, but the docstring doesn't mention this parameter. Would you like to add documentation for it in the Parameters section? wdyt?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 51cfea5 and 9f1b532.

📒 Files selected for processing (6)
  • airbyte_cdk/connector_builder/connector_builder_handler.py (1 hunks)
  • airbyte_cdk/connector_builder/test_reader/helpers.py (1 hunks)
  • airbyte_cdk/connector_builder/test_reader/message_grouper.py (3 hunks)
  • airbyte_cdk/connector_builder/test_reader/reader.py (3 hunks)
  • airbyte_cdk/sources/declarative/manifest_declarative_source.py (1 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/spec.json:9-15
Timestamp: 2024-11-15T00:59:08.154Z
Learning: When code in `airbyte_cdk/cli/source_declarative_manifest/` is being imported from another repository, avoid suggesting modifications to it during the import process.
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.
airbyte_cdk/sources/declarative/manifest_declarative_source.py (4)

Learnt from: ChristoGrab
PR: #58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the YamlDeclarativeSource class in airbyte_cdk/sources/declarative/yaml_declarative_source.py, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.

Learnt from: aaronsteers
PR: #58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in airbyte_cdk/cli/source_declarative_manifest/, including _run.py, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.

Learnt from: aaronsteers
PR: #58
File: airbyte_cdk/cli/source_declarative_manifest/spec.json:9-15
Timestamp: 2024-11-15T00:59:08.154Z
Learning: When code in airbyte_cdk/cli/source_declarative_manifest/ is being imported from another repository, avoid suggesting modifications to it during the import process.

Learnt from: aaronsteers
PR: #174
File: airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py:1093-1102
Timestamp: 2025-01-14T00:20:32.310Z
Learning: In the airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py file, the strict module name checks in _get_class_from_fully_qualified_class_name (requiring module_name to be "components" and module_name_full to be "source_declarative_manifest.components") are intentionally designed to provide early, clear feedback when class declarations won't be found later in execution. These restrictions may be loosened in the future if the requirements for class definition locations change.

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)

Learnt from: aaronsteers
PR: #174
File: airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py:1093-1102
Timestamp: 2025-01-14T00:20:32.310Z
Learning: In the airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py file, the strict module name checks in _get_class_from_fully_qualified_class_name (requiring module_name to be "components" and module_name_full to be "source_declarative_manifest.components") are intentionally designed to provide early, clear feedback when class declarations won't be found later in execution. These restrictions may be loosened in the future if the requirements for class definition locations change.

Learnt from: ChristoGrab
PR: #58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the YamlDeclarativeSource class in airbyte_cdk/sources/declarative/yaml_declarative_source.py, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.

airbyte_cdk/connector_builder/connector_builder_handler.py (1)

Learnt from: aaronsteers
PR: #58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in airbyte_cdk/cli/source_declarative_manifest/, including _run.py, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.

airbyte_cdk/connector_builder/test_reader/reader.py (2)

Learnt from: ChristoGrab
PR: #58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the YamlDeclarativeSource class in airbyte_cdk/sources/declarative/yaml_declarative_source.py, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.

Learnt from: aaronsteers
PR: #58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in airbyte_cdk/cli/source_declarative_manifest/, including _run.py, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.

🪛 GitHub Actions: Linters
airbyte_cdk/connector_builder/test_reader/message_grouper.py

[error] 6-40: Ruff: Import block is un-sorted or un-formatted. Organize imports. 1 fixable error with the --fix option.

airbyte_cdk/connector_builder/test_reader/helpers.py

[error] 293-293: mypy error: Item "None" of "dict[str, Any] | None" has no attribute "get" [union-attr]

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Check: source-shopify
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (8)
airbyte_cdk/sources/declarative/manifest_declarative_source.py (1)

549-549: LGTM! Clean implementation of stream name propagation.

This change correctly passes the stream name from the dynamic definition to the component factory, which aligns perfectly with the PR's goal of enabling stream-specific processing for mixed static/dynamic streams. The use of .get("name") is a nice defensive touch that handles potential missing names gracefully.

airbyte_cdk/connector_builder/connector_builder_handler.py (1)

110-117: Perfect! Stream name now correctly passed to test reader.

This change properly addresses one of the key issues mentioned in the PR - ensuring that the correct stream is identified for testing rather than defaulting to the first stream in the manifest. The stream name extraction from configured_catalog.streams[0].stream.name makes sense since the connector builder supports single stream testing, and this will now enable proper stream-specific filtering downstream.

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

3807-3816: LGTM! Nice improvement for stream-specific component naming.

This change enables much better debugging and logging by associating HTTP components resolvers with their specific streams. The fallback to "__http_components_resolver" when no stream name is provided is a thoughtful touch that maintains backward compatibility while providing a descriptive default. This should help significantly with the message grouping issues mentioned in the PR objectives, wdyt?

airbyte_cdk/connector_builder/test_reader/message_grouper.py (1)

101-102: LGTM! Clean filtering logic for stream-specific processing

This filtering logic perfectly addresses the issue described in the PR where HTTP page requests from dynamic streams were interfering with message grouping for other streams. The early continue ensures clean separation of concerns.

airbyte_cdk/connector_builder/test_reader/reader.py (4)

89-89: Acknowledge the necessary breaking change to support stream-specific reads

Adding the stream_name parameter represents a breaking change to this method signature. Given that this addresses the core issue described in the PR (selecting the correct stream instead of defaulting to the first one), this change seems necessary and justified. The impact should be limited since this appears to be an internal API for the connector builder, wdyt?


116-117: Excellent fix for stream selection logic

This change directly addresses the issue described in the PR where primary_key and cursor_field were incorrectly extracted from the first stream. The new logic correctly finds the stream by name, which ensures the right stream metadata is used for schema inference. Nice work!


123-127: Good defensive programming with null-safe stream property access

The null-safe handling for stream.primary_key and stream.cursor_field is well-implemented. This prevents potential AttributeError exceptions when no matching stream is found, which could happen if an invalid stream_name is passed. Solid defensive coding approach!


135-135: LGTM! Proper propagation of stream_name for filtering

This correctly passes the stream_name to get_message_groups, enabling the stream-specific message filtering implemented in the message grouper. The parameter threading is consistent throughout the call chain.

@github-actions github-actions bot added bug Something isn't working security labels Jul 29, 2025
Copy link
Contributor

@brianjlai brianjlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one suggestion, non-blocking

@lmossman lmossman merged commit a59d25f into main Jul 29, 2025
24 of 25 checks passed
@lmossman lmossman deleted the lmossman/fix-builder-test-read branch July 29, 2025 23:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working security
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants