-
Notifications
You must be signed in to change notification settings - Fork 66
fix: Allow composite primary key overrides in PyAirbyte #739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Fixed validation logic in catalog_providers.py that incorrectly rejected composite primary keys - Changed validation from checking column count to checking for nested field references (dots) - Composite keys like ['id', 'category'] now work correctly - Nested keys like ['data.id'] are still properly rejected - All existing unit tests pass Fixes issue where set_primary_keys(stream=['col1', 'col2']) would fail with 'Nested primary keys are not supported' error despite composite keys being supported. Co-Authored-By: AJ Steers <[email protected]>
Original prompt from AJ Steers
|
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This PyAirbyte VersionYou can test this version of PyAirbyte using the following: # Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1754070845-fix-composite-primary-keys' pyairbyte --help
# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1754070845-fix-composite-primary-keys' Helpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
Community SupportQuestions? Join the #pyairbyte channel in our Slack workspace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a bug in PyAirbyte where composite primary keys (multiple columns like ['id', 'category']
) were incorrectly rejected due to overly restrictive validation logic.
- Fixed validation in
get_primary_keys()
to allow composite primary keys while still preventing nested field references - Changed validation from checking column count to checking for dot notation in field names
- Updated error message to clarify that only nested fields (not composite keys) are unsupported
|
||
for pk_nodes in normalized_pks: | ||
if len(pk_nodes) != 1: | ||
if any("." in node for node in pk_nodes): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validation logic may incorrectly reject valid field names that contain dots but are not nested references. Consider using a more robust check that distinguishes between legitimate field names with dots and actual nested field paths.
if any("." in node for node in pk_nodes): | |
if len(pk_nodes) > 1: |
Copilot uses AI. Check for mistakes.
📝 WalkthroughWalkthroughThe validation logic in the Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Possibly related PRs
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🧰 Additional context used🧠 Learnings (1)📓 Common learnings
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
🔇 Additional comments (1)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
fix: Allow composite primary keys in PyAirbyte
Summary
Fixed a bug in PyAirbyte's primary key validation that incorrectly rejected composite primary keys like
['id', 'category']
with the error "Nested primary keys are not supported. Each PK column should have exactly one node."The validation logic in
catalog_providers.py
was checking if each primary key had exactly one column (len(pk_nodes) != 1
), which blocked composite keys. The intent was to prevent nested field references like['data.id']
, not composite keys.Changes made:
get_primary_keys()
validation inairbyte/shared/catalog_providers.py
['id', 'category']
now work correctly['data.id']
are still properly rejectedReview & Testing Checklist for Human
any("." in node for node in pk_nodes)
properly distinguishes between composite keys (['id', 'category']
) and nested keys (['data.id']
)Recommended test plan:
Diagram
Notes
This fix resolves the issue reported by AJ Steers where composite primary keys were incorrectly blocked by PyAirbyte's validation. The change is minimal (2 lines) but affects a core validation function used throughout the system.
Key validation change:
if len(pk_nodes) != 1:
(rejected any multi-column key)if any("." in node for node in pk_nodes):
(only rejects nested field references)All existing unit tests pass, including the specific composite primary key test that was previously failing. The reproduction script now successfully processes composite primary keys with both DuckDB and Snowflake destinations.
Session details: Requested by AJ Steers (@aaronsteers) in Devin session: https://app.devin.ai/sessions/bc5912d0a95e499dab8e86e329f1face
Summary by CodeRabbit