-
Notifications
You must be signed in to change notification settings - Fork 30
chore: have declarative availability check support AbstractStream #686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: have declarative availability check support AbstractStream #686
Conversation
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@maxi297/availability_strategy_to_support_abstract_stream#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch maxi297/availability_strategy_to_support_abstract_stream Helpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
/autofix
|
PyTest Results (Fast)3 697 tests +2 3 686 ✅ +2 6m 38s ⏱️ +3s Results for commit 2bc4b30. ± Comparison against base commit f976c72. This pull request removes 5 and adds 7 tests. Note that renamed tests count towards both.
♻️ This comment has been updated with latest results. |
PyTest Results (Full)3 700 tests +2 3 689 ✅ +2 11m 39s ⏱️ +6s Results for commit 2bc4b30. ± Comparison against base commit f976c72. This pull request removes 5 and adds 7 tests. Note that renamed tests count towards both.
♻️ This comment has been updated with latest results. |
/autofix
|
…ed' into maxi297/availability_strategy_to_support_abstract_stream
📝 WalkthroughWalkthroughThis set of changes refactors the availability checking mechanism across several modules. It removes deprecated and strategy-based availability classes, consolidates availability representation, updates type hints, and migrates to direct, inline, or standalone function-based availability checks for streams. Related test suites are updated to match the new approach. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CheckStream
participant Stream (or AbstractStream)
User->>CheckStream: check_connection()
CheckStream->>Stream: evaluate_availability(stream, logger)
alt Stream is legacy
CheckStream->>Stream: HttpAvailabilityStrategy().check_availability()
else Stream is new
CheckStream->>Stream: stream.check_availability()
end
Stream-->>CheckStream: (is_available, reason)
CheckStream-->>User: (result)
sequenceDiagram
participant DefaultStream
participant PartitionGenerator
participant Partition
DefaultStream->>PartitionGenerator: generate_partitions()
alt No partitions
DefaultStream-->>Caller: StreamAvailability.unavailable("No partitions generated")
else Exception during partitioning
DefaultStream-->>Caller: StreamAvailability.unavailable(error_message)
else
DefaultStream->>Partition: read_records()
alt Exception during reading
DefaultStream-->>Caller: StreamAvailability.unavailable(error_message)
else
DefaultStream-->>Caller: StreamAvailability.available()
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~18 minutes Possibly related PRs
Would you like to cross-reference any of these PRs in your summary or changelog, wdyt? Suggested labels
Would you like to add any additional labels for visibility, wdyt? Suggested reviewers
Anyone else you'd like to include for extra eyes on these refactors, wdyt? Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
airbyte_cdk/sources/streams/concurrent/availability_strategy.py (1)
10-16
: Address naming consistencyAs mentioned in a previous review, for consistency both
available
andunavailable
methods should use the same convention - either both usecls
or both useStreamAvailability
. Currently one usesStreamAvailability
and one usescls
.@classmethod def available(cls) -> "StreamAvailability": - return cls(True) + return StreamAvailability(True)Or alternatively:
@classmethod def unavailable(cls, reason: str) -> "StreamAvailability": - return cls(False, reason) + return StreamAvailability(False, reason)
🧹 Nitpick comments (3)
airbyte_cdk/sources/streams/availability_strategy.py (1)
17-17
: Make the FIXME comment more specific, wdyt?The current comment
# FIXME this
is quite vague. Since this appears to be part of the availability strategy refactoring mentioned in the PR, could you make it more descriptive? Something like# FIXME: Consider deprecating this class as part of AbstractStream migration
would be more actionable.-# FIXME this +# FIXME: Consider deprecating this class as part of AbstractStream migrationairbyte_cdk/sources/declarative/checks/check_dynamic_stream.py (1)
13-13
: Remove unused import, wdyt?It looks like
HttpAvailabilityStrategy
is no longer used since the code now usesevaluate_availability
function on line 47. Should we remove this import to clean up the unused reference?-from airbyte_cdk.sources.streams.http.availability_strategy import HttpAvailabilityStrategy
unit_tests/sources/streams/concurrent/test_default_stream.py (1)
259-259
: Consider usingis False
for boolean comparisonFor more Pythonic code, consider using
is False
instead of== False
for boolean comparisons. The same applies tois True
comparisons throughout the test file, wdyt?- assert availability.is_available == False + assert availability.is_available is False
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
airbyte_cdk/sources/declarative/checks/check_dynamic_stream.py
(2 hunks)airbyte_cdk/sources/declarative/checks/check_stream.py
(5 hunks)airbyte_cdk/sources/declarative/concurrent_declarative_source.py
(0 hunks)airbyte_cdk/sources/file_based/availability_strategy/__init__.py
(1 hunks)airbyte_cdk/sources/file_based/availability_strategy/abstract_file_based_availability_strategy.py
(1 hunks)airbyte_cdk/sources/file_based/stream/abstract_file_based_stream.py
(0 hunks)airbyte_cdk/sources/file_based/stream/concurrent/adapters.py
(0 hunks)airbyte_cdk/sources/streams/availability_strategy.py
(1 hunks)airbyte_cdk/sources/streams/concurrent/abstract_stream.py
(1 hunks)airbyte_cdk/sources/streams/concurrent/adapters.py
(0 hunks)airbyte_cdk/sources/streams/concurrent/availability_strategy.py
(1 hunks)airbyte_cdk/sources/streams/concurrent/default_stream.py
(2 hunks)unit_tests/sources/declarative/checks/test_check_stream.py
(4 hunks)unit_tests/sources/streams/concurrent/scenarios/thread_based_concurrent_stream_scenarios.py
(0 hunks)unit_tests/sources/streams/concurrent/test_adapters.py
(0 hunks)unit_tests/sources/streams/concurrent/test_default_stream.py
(2 hunks)
💤 Files with no reviewable changes (6)
- airbyte_cdk/sources/file_based/stream/abstract_file_based_stream.py
- unit_tests/sources/streams/concurrent/test_adapters.py
- unit_tests/sources/streams/concurrent/scenarios/thread_based_concurrent_stream_scenarios.py
- airbyte_cdk/sources/declarative/concurrent_declarative_source.py
- airbyte_cdk/sources/file_based/stream/concurrent/adapters.py
- airbyte_cdk/sources/streams/concurrent/adapters.py
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/spec.json:9-15
Timestamp: 2024-11-15T00:59:08.154Z
Learning: When code in `airbyte_cdk/cli/source_declarative_manifest/` is being imported from another repository, avoid suggesting modifications to it during the import process.
📚 Learning: the files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from ...
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
Applied to files:
airbyte_cdk/sources/file_based/availability_strategy/__init__.py
airbyte_cdk/sources/declarative/checks/check_dynamic_stream.py
unit_tests/sources/declarative/checks/test_check_stream.py
airbyte_cdk/sources/streams/concurrent/default_stream.py
airbyte_cdk/sources/file_based/availability_strategy/abstract_file_based_availability_strategy.py
📚 Learning: when code in `airbyte_cdk/cli/source_declarative_manifest/` is being imported from another repositor...
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/spec.json:9-15
Timestamp: 2024-11-15T00:59:08.154Z
Learning: When code in `airbyte_cdk/cli/source_declarative_manifest/` is being imported from another repository, avoid suggesting modifications to it during the import process.
Applied to files:
airbyte_cdk/sources/file_based/availability_strategy/__init__.py
airbyte_cdk/sources/declarative/checks/check_dynamic_stream.py
airbyte_cdk/sources/file_based/availability_strategy/abstract_file_based_availability_strategy.py
📚 Learning: when modifying the `yamldeclarativesource` class in `airbyte_cdk/sources/declarative/yaml_declarativ...
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
Applied to files:
airbyte_cdk/sources/declarative/checks/check_dynamic_stream.py
unit_tests/sources/declarative/checks/test_check_stream.py
airbyte_cdk/sources/streams/concurrent/availability_strategy.py
airbyte_cdk/sources/streams/concurrent/default_stream.py
airbyte_cdk/sources/file_based/availability_strategy/abstract_file_based_availability_strategy.py
airbyte_cdk/sources/declarative/checks/check_stream.py
🧬 Code Graph Analysis (3)
airbyte_cdk/sources/file_based/availability_strategy/__init__.py (1)
airbyte_cdk/sources/file_based/availability_strategy/abstract_file_based_availability_strategy.py (1)
AbstractFileBasedAvailabilityStrategy
(19-47)
unit_tests/sources/declarative/checks/test_check_stream.py (4)
unit_tests/sources/mock_server_tests/mock_source_fixture.py (2)
streams
(406-413)spec
(415-430)airbyte_cdk/sources/abstract_source.py (1)
streams
(74-79)unit_tests/sources/test_source.py (2)
streams
(58-61)streams
(148-150)airbyte_cdk/sources/streams/core.py (1)
Stream
(118-703)
airbyte_cdk/sources/streams/concurrent/abstract_stream.py (2)
airbyte_cdk/sources/streams/concurrent/default_stream.py (1)
check_availability
(97-125)airbyte_cdk/sources/streams/concurrent/availability_strategy.py (1)
StreamAvailability
(9-37)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: source-shopify
- GitHub Check: Check: source-intercom
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-google-drive
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Analyze (python)
🔇 Additional comments (11)
airbyte_cdk/sources/file_based/availability_strategy/__init__.py (1)
1-7
: LGTM on the cleanup!The removal of
AbstractFileBasedAvailabilityStrategyWrapper
from the exports is consistent with the broader refactoring to eliminate deprecated availability strategy wrappers. The remaining exports for the base abstract class and default implementation look appropriate.airbyte_cdk/sources/streams/concurrent/abstract_stream.py (1)
92-96
: Good reorganization of the abstract methods!Moving
check_availability
to be closer to the other core properties likecursor
makes the interface organization more logical. The docstring update also makes the return value clearer.unit_tests/sources/declarative/checks/test_check_stream.py (2)
19-19
: Great addition for better test mocking!Adding the Stream import to support the spec parameter in MagicMock is a good improvement for type safety.
49-49
: Excellent improvement to mock specifications!Using
MagicMock(spec=Stream)
instead of plainMagicMock()
provides much better type safety in tests. This will catch attribute access errors early and ensures the mocks conform to the actual Stream interface. Nice consistency across all the test functions!Also applies to: 81-81, 95-95
airbyte_cdk/sources/declarative/checks/check_dynamic_stream.py (3)
7-7
: Good updates for AbstractStream support!The import changes look good - adding the Union type and importing the new
evaluate_availability
function aligns well with the migration goals. The more specific import path forAbstractSource
is also an improvement.Also applies to: 9-12
38-38
: Appropriate migration typing!The Union type hint for supporting both
Stream
andAbstractStream
with thetype: ignore
comment properly acknowledges this is a transitional state during the migration. This is exactly the kind of pragmatic approach needed for incremental migrations.
47-47
: Excellent refactoring to unified availability checking!Replacing the strategy pattern with the
evaluate_availability
function is much cleaner and aligns perfectly with the PR objective of supporting AbstractStream. This function can handle both stream types appropriately, which is exactly what's needed during the migration phase.airbyte_cdk/sources/streams/concurrent/default_stream.py (1)
97-126
: Well-implemented availability check!The implementation correctly handles various scenarios and gracefully manages exceptions. The approach of checking the first record is efficient. Just one minor thought - the comment on lines 104-107 about legacy behavior could be clearer about what specific legacy pattern this addresses, wdyt?
airbyte_cdk/sources/file_based/availability_strategy/abstract_file_based_availability_strategy.py (1)
19-48
: LGTM! Clean refactoringThe changes appropriately remove deprecated dependencies and the optional
source
parameter provides good backward compatibility during the migration period.airbyte_cdk/sources/declarative/checks/check_stream.py (2)
17-30
: Excellent migration abstraction!The
evaluate_availability
function provides a clean way to handle both stream types during the transition period. The clear documentation and appropriate error handling for unsupported types make this a solid implementation.
71-71
: Type ignore comment is well-justifiedGood to see the
type: ignore
comment includes clear reasoning about this being a migration step. This helps future maintainers understand why it's there.
What
Following the removal of unused stuff regarding availability, we want to support availability checks as much as possible as is for the new AbstractStream interface until we migrate fully to it.
What might change when we're done with the migration?
AbstractSource.streams
DeclarativeStream
stuffHow
Assuming
steams
can returnAbstractStream
within the list and if it is the case, availability is done throughAbstractStream.check_availability
.Note that some of the exit_on_rate_limit logic wasn't ported from the legacy CDK because nothing sets this in low code and we've only seen one instance of a source playing with this concept (source-github). For more information, see this Slack thread
Also note that as part of this PR, the most of the added code will not run in prod because we never return an
AbstractStream
inAbstractSource.streams
. In order to test it though, I replace the return statement in ModelToComponentFactory.create_declarative_stream` by the following:Summary by CodeRabbit
Refactor
Tests
Chores