-
Notifications
You must be signed in to change notification settings - Fork 30
fix: make builder test read work when a mix of static and dynamic streams are passed in #676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@lmossman/fix-builder-test-read#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch lmossman/fix-builder-test-read Helpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes issues with the Builder test read functionality when mixing static and dynamic streams by ensuring correct stream identification and filtering out irrelevant HTTP requests from dynamic stream resolvers.
- Fixed stream identification to find the correct stream by name instead of using the first stream
- Updated HTTP component resolvers to include stream name context for proper request filtering
- Added message filtering to ignore HTTP page requests from different streams during test reads
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py |
Updated HTTP components resolver to accept and use stream name parameter |
airbyte_cdk/sources/declarative/manifest_declarative_source.py |
Pass stream name to HTTP components resolver during dynamic stream creation |
airbyte_cdk/connector_builder/test_reader/reader.py |
Modified to find stream by name and handle null stream cases |
airbyte_cdk/connector_builder/test_reader/message_grouper.py |
Added filtering to skip HTTP requests from different streams |
airbyte_cdk/connector_builder/test_reader/helpers.py |
Added helper function to identify HTTP requests from different streams |
airbyte_cdk/connector_builder/connector_builder_handler.py |
Updated function call to pass stream name parameter |
📝 WalkthroughWalkthroughThis change updates several internal APIs to ensure that stream-specific operations correctly use the stream name. The main adjustments include passing the Changes
Sequence Diagram(s)sequenceDiagram
participant Handler as ConnectorBuilderHandler
participant TestReader as TestReader
participant Grouper as MessageGrouper
participant Helpers as Helpers
Handler->>TestReader: run_test_read(..., stream_name, ...)
TestReader->>Grouper: get_message_groups(..., stream_name)
loop For each message
Grouper->>Helpers: is_page_http_request_for_different_stream(message, stream_name)
alt If true
Grouper-->>Grouper: Skip message
else
Grouper-->>Grouper: Process message
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~18 minutes Suggested labels
Suggested reviewers
Would you like to see additional test coverage for the new stream filtering logic, or do you feel the current updates are sufficient for now? Wdyt? Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
airbyte_cdk/connector_builder/test_reader/message_grouper.py (2)
6-40
: Fix import sorting to resolve linter errorThe linter is flagging that the import block is unsorted. Would you mind running
ruff --fix
to organize the imports properly? wdyt?
48-49
: Consider updating the docstring to document the new parameterThe function signature now includes
stream_name: str
, but the docstring doesn't mention this parameter. Would you like to add documentation for it in the Parameters section? wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
airbyte_cdk/connector_builder/connector_builder_handler.py
(1 hunks)airbyte_cdk/connector_builder/test_reader/helpers.py
(1 hunks)airbyte_cdk/connector_builder/test_reader/message_grouper.py
(3 hunks)airbyte_cdk/connector_builder/test_reader/reader.py
(3 hunks)airbyte_cdk/sources/declarative/manifest_declarative_source.py
(1 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
(1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/spec.json:9-15
Timestamp: 2024-11-15T00:59:08.154Z
Learning: When code in `airbyte_cdk/cli/source_declarative_manifest/` is being imported from another repository, avoid suggesting modifications to it during the import process.
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.
airbyte_cdk/sources/declarative/manifest_declarative_source.py (4)
Learnt from: ChristoGrab
PR: #58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the YamlDeclarativeSource
class in airbyte_cdk/sources/declarative/yaml_declarative_source.py
, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
Learnt from: aaronsteers
PR: #58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in airbyte_cdk/cli/source_declarative_manifest/
, including _run.py
, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
Learnt from: aaronsteers
PR: #58
File: airbyte_cdk/cli/source_declarative_manifest/spec.json:9-15
Timestamp: 2024-11-15T00:59:08.154Z
Learning: When code in airbyte_cdk/cli/source_declarative_manifest/
is being imported from another repository, avoid suggesting modifications to it during the import process.
Learnt from: aaronsteers
PR: #174
File: airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py:1093-1102
Timestamp: 2025-01-14T00:20:32.310Z
Learning: In the airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
file, the strict module name checks in _get_class_from_fully_qualified_class_name
(requiring module_name
to be "components" and module_name_full
to be "source_declarative_manifest.components") are intentionally designed to provide early, clear feedback when class declarations won't be found later in execution. These restrictions may be loosened in the future if the requirements for class definition locations change.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
Learnt from: aaronsteers
PR: #174
File: airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py:1093-1102
Timestamp: 2025-01-14T00:20:32.310Z
Learning: In the airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
file, the strict module name checks in _get_class_from_fully_qualified_class_name
(requiring module_name
to be "components" and module_name_full
to be "source_declarative_manifest.components") are intentionally designed to provide early, clear feedback when class declarations won't be found later in execution. These restrictions may be loosened in the future if the requirements for class definition locations change.
Learnt from: ChristoGrab
PR: #58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the YamlDeclarativeSource
class in airbyte_cdk/sources/declarative/yaml_declarative_source.py
, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
airbyte_cdk/connector_builder/connector_builder_handler.py (1)
Learnt from: aaronsteers
PR: #58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in airbyte_cdk/cli/source_declarative_manifest/
, including _run.py
, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
airbyte_cdk/connector_builder/test_reader/reader.py (2)
Learnt from: ChristoGrab
PR: #58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the YamlDeclarativeSource
class in airbyte_cdk/sources/declarative/yaml_declarative_source.py
, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
Learnt from: aaronsteers
PR: #58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in airbyte_cdk/cli/source_declarative_manifest/
, including _run.py
, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
🪛 GitHub Actions: Linters
airbyte_cdk/connector_builder/test_reader/message_grouper.py
[error] 6-40: Ruff: Import block is un-sorted or un-formatted. Organize imports. 1 fixable error with the --fix
option.
airbyte_cdk/connector_builder/test_reader/helpers.py
[error] 293-293: mypy error: Item "None" of "dict[str, Any] | None" has no attribute "get" [union-attr]
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Check: source-shopify
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (8)
airbyte_cdk/sources/declarative/manifest_declarative_source.py (1)
549-549
: LGTM! Clean implementation of stream name propagation.This change correctly passes the stream name from the dynamic definition to the component factory, which aligns perfectly with the PR's goal of enabling stream-specific processing for mixed static/dynamic streams. The use of
.get("name")
is a nice defensive touch that handles potential missing names gracefully.airbyte_cdk/connector_builder/connector_builder_handler.py (1)
110-117
: Perfect! Stream name now correctly passed to test reader.This change properly addresses one of the key issues mentioned in the PR - ensuring that the correct stream is identified for testing rather than defaulting to the first stream in the manifest. The stream name extraction from
configured_catalog.streams[0].stream.name
makes sense since the connector builder supports single stream testing, and this will now enable proper stream-specific filtering downstream.airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
3807-3816
: LGTM! Nice improvement for stream-specific component naming.This change enables much better debugging and logging by associating HTTP components resolvers with their specific streams. The fallback to
"__http_components_resolver"
when no stream name is provided is a thoughtful touch that maintains backward compatibility while providing a descriptive default. This should help significantly with the message grouping issues mentioned in the PR objectives, wdyt?airbyte_cdk/connector_builder/test_reader/message_grouper.py (1)
101-102
: LGTM! Clean filtering logic for stream-specific processingThis filtering logic perfectly addresses the issue described in the PR where HTTP page requests from dynamic streams were interfering with message grouping for other streams. The early
continue
ensures clean separation of concerns.airbyte_cdk/connector_builder/test_reader/reader.py (4)
89-89
: Acknowledge the necessary breaking change to support stream-specific readsAdding the
stream_name
parameter represents a breaking change to this method signature. Given that this addresses the core issue described in the PR (selecting the correct stream instead of defaulting to the first one), this change seems necessary and justified. The impact should be limited since this appears to be an internal API for the connector builder, wdyt?
116-117
: Excellent fix for stream selection logicThis change directly addresses the issue described in the PR where
primary_key
andcursor_field
were incorrectly extracted from the first stream. The new logic correctly finds the stream by name, which ensures the right stream metadata is used for schema inference. Nice work!
123-127
: Good defensive programming with null-safe stream property accessThe null-safe handling for
stream.primary_key
andstream.cursor_field
is well-implemented. This prevents potentialAttributeError
exceptions when no matching stream is found, which could happen if an invalidstream_name
is passed. Solid defensive coding approach!
135-135
: LGTM! Proper propagation of stream_name for filteringThis correctly passes the
stream_name
toget_message_groups
, enabling the stream-specific message filtering implemented in the message grouper. The parameter threading is consistent throughout the call chain.
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one suggestion, non-blocking
This PR fixes a couple of issues which were uncovered upon the migration of the Builder UI to SchemaForm.
The reason these issues are now apparent is because the SchemaForm version of the Builder UI passes the whole manifest to the CDK for a test read, rather than just stripping it down to only the stream being tested, since it now relies on
$refs
between streams that the CDK must resolve when performing the test read.This uncovered 2 issues that this PR fixes:
primary_key
andcursor_field
values could be used, because therun_test_read
function was naively grabbing them from the first stream, rather than finding the stream that matches to the stream name being tested and pulling them from that.I also fixed the logging for async streams, so that the download request/response is now properly logged with the stream name included, so it is correctly shown in the Builder UI.
Summary by CodeRabbit
New Features
Bug Fixes
Refactor