-
Notifications
You must be signed in to change notification settings - Fork 30
feat: add robust datetime parsing fallback to DatetimeBasedCursor, deprecate/hide cursor_datetime_formats
and datetime_format
in the UI
#735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add ab_datetime_try_parse as fallback when expected formats fail - Maintain backward compatibility by trying expected formats first - Update schema documentation to reflect new fallback behavior - Update test to use truly unparseable string for error case - Preserve original cursor value format in state storage - Eliminate parsing errors for ISO8601/RFC3339 compliant datetimes Co-Authored-By: AJ Steers <[email protected]>
Original prompt from AJ Steers
|
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1724868030-datetime-parsing-simplification#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1724868030-datetime-parsing-simplification Helpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances the DatetimeBasedCursor's datetime parsing capabilities by adding a robust fallback mechanism to handle a wider variety of datetime formats while maintaining backward compatibility.
- Adds
ab_datetime_try_parse
as a fallback when specifiedcursor_datetime_formats
fail to parse datetime values - Updates test case to use a truly unparseable string instead of a valid ISO datetime
- Updates schema documentation to explain the new fallback behavior
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
airbyte_cdk/sources/declarative/incremental/datetime_based_cursor.py | Implements fallback parsing using ab_datetime_try_parse in the parse_date method |
airbyte_cdk/sources/declarative/declarative_component_schema.yaml | Updates documentation to describe the new fallback parsing behavior |
unit_tests/sources/declarative/incremental/test_datetime_based_cursor.py | Fixes test case to use an actually unparseable string instead of valid ISO datetime |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
airbyte_cdk/sources/declarative/incremental/datetime_based_cursor.py
Outdated
Show resolved
Hide resolved
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthroughAdds a robust fallback datetime parsing path used by DatetimeBasedCursor and MinMaxDatetime, updates declarative schema to document/hide fields, and expands tests to cover the new fallback and invalid inputs. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller
participant Cursor as DatetimeBasedCursor / MinMaxDatetime
participant Fmts as Configured Formats
participant Fallback as ab_datetime_try_parse
Caller->>Cursor: parse_date / get_datetime(datetime_expr)
rect rgb(245, 250, 255)
Note right of Cursor: 1) Try configured format(s)
Cursor->>Fmts: Apply formats from config (if any)
alt Any format parses
Fmts-->>Cursor: Parsed datetime
Cursor-->>Caller: Return datetime (apply min/max if relevant)
else No configured format or none match
Note right of Cursor: 2) New robust fallback
Cursor->>Fallback: Try robust ISO8601/RFC3339/epoch parsing
alt Fallback parses successfully
Fallback-->>Cursor: Parsed datetime
Cursor-->>Caller: Return datetime (apply min/max if relevant)
else Fallback fails
Cursor-->>Caller: Raise ValueError
end
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Would you like a brief changelog entry for release notes as well, wdyt? ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1)
947-949
: Document the full scope of the fallback and ordering.Nice addition. Two small clarifications could help users:
- Explicitly note that the robust fallback also covers Unix timestamps and common variants (as per PR summary).
- Clarify that when cursor_datetime_formats is omitted, the Outgoing Datetime Format is attempted once (to avoid implying it may be tried multiple times).
Wdyt about updating the text accordingly?
- If none of the specified formats match, the system will attempt to parse the value using robust datetime parsing that handles most ISO8601/RFC3339 compliant formats. - If not provided, the Outgoing Datetime Format will be used as the first attempt. + If none of the specified formats match, the system will attempt to parse the value using a robust datetime parser that handles most ISO8601/RFC3339 variants, Unix timestamps (seconds and milliseconds), and other common formats. + If not provided, the Outgoing Datetime Format will be used as the first attempt (exactly once).unit_tests/sources/declarative/incremental/test_datetime_based_cursor.py (1)
1024-1024
: LGTM: negative test now targets an actually-unparseable string.This aligns with the new fallback behavior. Could we also add a positive test to assert the fallback succeeds for ISO/RFC3339 and Unix timestamp inputs when no configured format matches, to prevent regressions? Wdyt?
@@ def test_given_unknown_format_when_parse_date_then_raise_error(): @@ with pytest.raises(ValueError): slicer.parse_date("not-a-valid-datetime-string") + +def test_parse_date_fallback_iso_when_configured_formats_fail(): + slicer = DatetimeBasedCursor( + start_datetime=MinMaxDatetime("2021-01-01T00:00:00.000000+0000", parameters={}), + cursor_field=InterpolatedString(cursor_field, parameters={}), + # Intentionally set an incompatible outgoing and cursor formats + datetime_format="%d/%m/%Y", + cursor_datetime_formats=["%m/%d/%Y"], + config=config, + parameters={}, + ) + # ISO/RFC3339 input should be parsed by the robust fallback + parsed = slicer.parse_date("2021-01-01T00:00:00Z") + assert parsed == datetime.datetime(2021, 1, 1, 0, 0, tzinfo=datetime.timezone.utc) + +def test_parse_date_fallback_unix_timestamp_when_configured_formats_fail(): + slicer = DatetimeBasedCursor( + start_datetime=MinMaxDatetime("2021-01-01T00:00:00.000000+0000", parameters={}), + cursor_field=InterpolatedString(cursor_field, parameters={}), + datetime_format="%Y-%m-%d", # does not include %s + cursor_datetime_formats=["%Y/%m/%d"], # still no %s + config=config, + parameters={}, + ) + parsed = slicer.parse_date("1609459200") # 2021-01-01T00:00:00Z + assert parsed == datetime.datetime(2021, 1, 1, 0, 0, tzinfo=datetime.timezone.utc)airbyte_cdk/sources/declarative/incremental/datetime_based_cursor.py (3)
24-24
: Remove unused import (keep CI lint fast and clean).ab_datetime_format isn’t used in this module. Shall we drop it to avoid dead imports, wdyt?
-from airbyte_cdk.utils.datetime_helpers import ab_datetime_format, ab_datetime_try_parse +from airbyte_cdk.utils.datetime_helpers import ab_datetime_try_parse
124-126
: Avoid double-trying the same format when no cursor formats are provided.Right now, parse_date loops over
self.cursor_datetime_formats + [self.datetime_format]
, and post_init seedscursor_datetime_formats
with[self.datetime_format]
when empty. That causes the outgoing format to be tried twice. Tiny perf nit, but easy to avoid. Shall we keepcursor_datetime_formats
empty and rely on parse_date to append the outgoing format once? Wdyt?- if not self.cursor_datetime_formats: - self.cursor_datetime_formats = [self.datetime_format] + if not self.cursor_datetime_formats: + # Leave empty; parse_date will try `datetime_format` exactly once at the end. + self.cursor_datetime_formats = []
311-322
: Great fallback; refine error message and consider observability.The fallback looks solid. Two ideas:
- Make the error message explicit that the robust fallback was also attempted (helps debugging).
- Optionally emit a debug log when the fallback path succeeds to surface “implicit format” usage in production (feature-flaggable), wdyt?
def parse_date(self, date: str) -> datetime.datetime: for datetime_format in self.cursor_datetime_formats + [self.datetime_format]: try: return self._parser.parse(date, datetime_format) except ValueError: pass parsed_dt = ab_datetime_try_parse(date) if parsed_dt: + # Optionally: self._send_log(Level.DEBUG, f"Parsed '{date}' via robust fallback parser") return parsed_dt - raise ValueError(f"No format in {self.cursor_datetime_formats} matching {date}") + attempted = self.cursor_datetime_formats + [self.datetime_format] + raise ValueError( + f"No matching datetime format for value '{date}'. Tried {attempted} and robust fallback." + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(1 hunks)airbyte_cdk/sources/declarative/incremental/datetime_based_cursor.py
(2 hunks)unit_tests/sources/declarative/incremental/test_datetime_based_cursor.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
airbyte_cdk/sources/declarative/incremental/datetime_based_cursor.py (1)
airbyte_cdk/utils/datetime_helpers.py (2)
ab_datetime_format
(464-499)ab_datetime_try_parse
(445-461)
unit_tests/sources/declarative/incremental/test_datetime_based_cursor.py (1)
airbyte_cdk/sources/declarative/incremental/datetime_based_cursor.py (1)
parse_date
(311-322)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-shopify
- GitHub Check: SDM Docker Image Build
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Analyze (python)
Co-authored-by: Copilot <[email protected]>
- Add comment explaining that ab_datetime_try_parse handles ISO8601/RFC3339 formats - Addresses PR feedback from @aaronsteers requesting documentation Co-Authored-By: AJ Steers <[email protected]>
- Add robust datetime parsing fallback to MinMaxDatetime using ab_datetime_try_parse - Hide cursor_datetime_formats and datetime_format fields in UI with airbyte_hidden: true - Expand parametrized tests with robust fallback scenarios that now succeed - Maintain backward compatibility while simplifying configuration - Keep datetime_format for API output formatting, decouple from cursor storage Co-Authored-By: AJ Steers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3)
946-946
: Confirm Builder honors airbyte_hidden on array items.Some UIs only respect airbyte_hidden at the property level, not within items. Can you confirm the Builder hides the item UI here; otherwise shall we move airbyte_hidden up to cursor_datetime_formats instead, wdyt?
948-951
: Make fallback capabilities explicit (timestamps).Since the fallback also parses Unix timestamps (seconds/milliseconds, floats), should we call that out to reduce confusion, e.g., “Fallback supports ISO8601/RFC3339 and Unix timestamps (s, ms, float)”? wdyt?
2808-2812
: Doc clarity for MinMaxDatetime format + fallback.Nice addition to hide this field. Given get_datetime now attempts a robust fallback, should we add a short note to this description indicating that if parsing with the provided format fails, robust parsing is attempted (and specify whether that applies to datetime, min_datetime, and max_datetime)? This prevents UX confusion. wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
airbyte_cdk/sources/declarative/datetime/min_max_datetime.py
(2 hunks)airbyte_cdk/sources/declarative/declarative_component_schema.yaml
(2 hunks)unit_tests/sources/declarative/incremental/test_datetime_based_cursor.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- unit_tests/sources/declarative/incremental/test_datetime_based_cursor.py
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte_cdk/sources/declarative/datetime/min_max_datetime.py (3)
airbyte_cdk/utils/datetime_helpers.py (1)
ab_datetime_try_parse
(445-461)airbyte_cdk/sources/declarative/interpolation/interpolated_string.py (1)
eval
(35-55)airbyte_cdk/sources/declarative/datetime/datetime_parser.py (1)
parse
(21-41)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-shopify
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: Pytest (Fast)
- GitHub Check: Analyze (python)
🔇 Additional comments (1)
airbyte_cdk/sources/declarative/datetime/min_max_datetime.py (1)
11-11
: LGTM on import.Importing ab_datetime_try_parse is appropriate for the new fallback path.
cursor_datetime_formats
and datetime_format
in the UI
cursor_datetime_formats
and datetime_format
in the UIcursor_datetime_formats
and datetime_format
in the UI
feat: add robust datetime parsing fallback to DatetimeBasedCursor
Summary
This PR enhances the DatetimeBasedCursor's datetime parsing to eliminate parsing errors for valid datetime formats while maintaining full backward compatibility. The implementation adds
ab_datetime_try_parse
as a fallback when the expectedcursor_datetime_formats
fail to parse a datetime value.Key changes:
cursor_datetime_formats
anddatetime_format
toairbyte_hidden=True
, essentially deprecating these inputs.parse_date()
method to use robust fallback parsing viaab_datetime_try_parse
Behavior: The cursor now tries user-specified formats first (as before), but falls back to robust parsing that handles most ISO8601/RFC3339 compliant formats, Unix timestamps, and other common datetime formats. This should eliminate the majority of datetime parsing errors while keeping the same performance for connectors using expected formats.
Review & Testing Checklist for Human
High Priority (3 items):
cursor_datetime_formats
Recommended end-to-end test plan:
cursor_datetime_formats
to verify fallback worksNotes
Summary by CodeRabbit
New Features
Documentation
Style/Chores
Tests