Skip to content

Conversation

maxi297
Copy link
Contributor

@maxi297 maxi297 commented Sep 4, 2025

The script will also not migrate the following even though they should be fine:

  • source-bing-ads: custom partition router
  • source-gong: custom cursor
  • source-google-sheets: custom partition router
  • source-hubspot: the weird test that we don't like
  • source-jira: custom partition router
  • source-instagram: custom partition router
  • source-intercom: custom retriever
  • source-orb (not in list official list that we maintain so I haven't investigated)

Summary by CodeRabbit

  • New Features

    • Added a migration tool to upgrade source connectors to CDK v7, detecting eligible declarative and Python connectors and migrating them automatically.
    • Updates connector base images or dependency versions and attempts to refresh lockfiles, with progress reporting and graceful warnings on failures.
  • Chores

    • Automatically increments connector changelogs with a new patch version and date after successful migrations.
  • Style

    • Improved console output and diagnostics for migration status and errors.

Copy link

github-actions bot commented Sep 4, 2025

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@maxi297/cdk7_migration_script#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch maxi297/cdk7_migration_script

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

@maxi297 maxi297 requested a review from brianjlai September 4, 2025 17:01
@maxi297
Copy link
Contributor Author

maxi297 commented Sep 4, 2025

/autofix

Copy link
Contributor

coderabbitai bot commented Sep 4, 2025

Caution

Review failed

An error occurred during the review process. Please try again later.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch maxi297/cdk7_migration_script

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (8)
bin/v7_migration.py (8)

456-462: Broaden PEP 621 dependency detection

startswith('airbyte-cdk') misses common forms (extras and varied specifiers). Would a regex handle airbyte-cdk, optional extras, and any comparator, wdyt?

-                for i, dep in enumerate(project_deps):
-                    if isinstance(dep, str) and dep.startswith('airbyte-cdk'):
-                        project_deps[i] = 'airbyte-cdk>=7.0.0,<8.0.0'
+                version_rx = re.compile(r'^airbyte-cdk(\[.*\])?\s*[~<>=!].*')
+                for i, dep in enumerate(project_deps):
+                    if isinstance(dep, str) and version_rx.match(dep):
+                        # Preserve extras if any
+                        extras = re.match(r'^airbyte-cdk(\[.*\])?', dep).group(1) or ''
+                        project_deps[i] = f'airbyte-cdk{extras}>=7.0.0,<8.0.0'
                         updated = True

108-174: Detect more v6 specs in pyproject

Currently only caret ^6 is caught. Many connectors use ranges like >=6,<7 or pinned ==6.*. Shall we detect these too to widen coverage, wdyt?

-                    if isinstance(cdk_version, str) and '^6' in cdk_version:
+                    if isinstance(cdk_version, str) and re.search(r'(\^6|>=\s*6(\.\d+)*\s*,\s*<\s*7|==\s*6(\.\d+)*)', cdk_version):
                         return True

And similarly for PEP 621 list entries:

-                            if '^6' in dep:
+                            if re.search(r'(\^6|>=\s*6(\.\d+)*\s*,\s*<\s*7|==\s*6(\.\d+)*)', dep):
                                 return True

519-537: Changelog parsing is fragile; tie version extraction to the table

_extract_current_version finds the first | x.y.z | anywhere, which might not be the changelog’s top entry. Shall we parse the table header and then read the next data row to extract the current version, wdyt?

-        current_version = self._extract_current_version(content)
+        current_version = self._extract_current_version(content)
         if not current_version:
             print(f"Could not determine current version for {self.name}")
             return False

Outside these ranges, consider replacing _extract_current_version with:

def _extract_current_version(self, content: str) -> Optional[str]:
    table_rx = re.compile(r'(\| Version \| Date.*?\n\| :--.*?\n)(?P<row>\|.*\n)', re.S)
    m = table_rx.search(content)
    if not m:
        return None
    first_row = m.group('row')
    ver_m = re.search(r'\|\s*(\d+\.\d+\.\d+)\s*\|', first_row)
    return ver_m.group(1) if ver_m else None

Also applies to: 543-571, 579-597


624-645: Docstring says “source- and destination-*” but code returns only sources*

Should we fix the docstring (or extend the function) to avoid confusion, wdyt?

-    Get all connector directories (source-* and destination-*).
+    Get all source connector directories (source-*).

52-94: manifest_content typing vs behavior and search paths

The property is annotated Optional[str] but always returns a string (possibly empty). Also, Python packages sometimes nest under src/. Should we (a) return None when absent and (b) also probe self.path/'src'/package_name/manifest.yaml, wdyt?


266-339: Import scanning could false-positive/negative; consider AST-based pass

String scanning may catch commented imports or miss multiline imports. Would you like a quick ast walk to find real Import/ImportFrom nodes and identifiers, wdyt?


696-703: Report non-eligibility reasons to aid manual follow-up

We print only names of non-eligible connectors. Shall we log per-connector reasons (e.g., which checks failed) to speed up manual migrations, wdyt?


9-11: TODO: supply declarative image digest

Do you want to add a small README note and make the new --manifest-image-digest flag explicit in the CLI help so we don’t forget before running this at scale, wdyt?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5378cef and 13d395f.

📒 Files selected for processing (1)
  • bin/v7_migration.py (1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Linters
bin/v7_migration.py

[error] 13-13: Ruff I001: Import block is un-sorted or un-formatted. Organize imports. Command: poetry run ruff check .

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-shopify
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: Analyze (python)

Comment on lines 386 to 426
def _migrate_declarative_manifest_to_v7(self, sha256_hash: str) -> bool:
"""
Migrate a declarative manifest connector to use CDK v7 base image.

Args:
sha256_hash: The SHA256 hash for the v7.0.0 base image

Returns:
True if migration was successful, False otherwise
"""
try:
metadata_file = self.path / "metadata.yaml"

if not metadata_file.exists():
print(f"No metadata.yaml found for {self.name}")
return False

# Load the current metadata
with open(metadata_file, 'r') as f:
metadata_content = f.read()

# Update the baseImage to version 7
new_base_image = f"docker.io/airbyte/source-declarative-manifest:7.0.0@sha256:{sha256_hash}"

# Replace the base image using regex to preserve formatting
base_image_pattern = r'(baseImage:\s*)[^\n\r]+'
replacement = rf'\g<1>{new_base_image}'

updated_content = re.sub(base_image_pattern, replacement, metadata_content)

# Write back the updated metadata
with open(metadata_file, 'w') as f:
f.write(updated_content)

print(f"Successfully migrated {self.name} to use CDK v7 declarative manifest")
return True

except Exception as e:
print(f"Error migrating declarative manifest for {self.name}: {e}")
return False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid regex-writing YAML; update structured field instead

Editing YAML via regex risks touching the wrong key and breaking formatting. Shall we load, update data.connectorBuildOptions.baseImage, and dump back, wdyt?

-            # Load the current metadata
-            with open(metadata_file, 'r') as f:
-                metadata_content = f.read()
-            
-            # Update the baseImage to version 7
-            new_base_image = f"docker.io/airbyte/source-declarative-manifest:7.0.0@sha256:{sha256_hash}"
-            
-            # Replace the base image using regex to preserve formatting
-            base_image_pattern = r'(baseImage:\s*)[^\n\r]+'
-            replacement = rf'\g<1>{new_base_image}'
-            
-            updated_content = re.sub(base_image_pattern, replacement, metadata_content)
-            
-            # Write back the updated metadata
-            with open(metadata_file, 'w') as f:
-                f.write(updated_content)
+            with open(metadata_file, 'r', encoding='utf-8') as f:
+                meta = yaml.safe_load(f)
+            new_base_image = f"docker.io/airbyte/source-declarative-manifest:7.0.0@sha256:{sha256_hash}"
+            meta.setdefault('data', {}).setdefault('connectorBuildOptions', {})['baseImage'] = new_base_image
+            with open(metadata_file, 'w', encoding='utf-8') as f:
+                yaml.safe_dump(meta, f, sort_keys=False)

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In bin/v7_migration.py around lines 386 to 426, the function currently edits
metadata.yaml with a regex which can target the wrong key and break YAML;
instead load the file as YAML, update the structured field
data.connectorBuildOptions.baseImage to
"docker.io/airbyte/source-declarative-manifest:7.0.0@sha256:<sha256_hash>"
(create connectorBuildOptions and baseImage keys if missing), then dump the YAML
back to the file using a YAML library (safe_load/safe_dump or ruamel.yaml to
better preserve formatting) and handle IO/errors accordingly.

Comment on lines 449 to 462
# Check poetry dependencies (dict format)
poetry_deps = pyproject_data.get('tool', {}).get('poetry', {}).get('dependencies', {})
if isinstance(poetry_deps, dict) and 'airbyte-cdk' in poetry_deps:
poetry_deps['airbyte-cdk'] = '^7'
updated = True

# Check PEP 621 dependencies (list format)
project_deps = pyproject_data.get('project', {}).get('dependencies', [])
if isinstance(project_deps, list):
for i, dep in enumerate(project_deps):
if isinstance(dep, str) and dep.startswith('airbyte-cdk'):
project_deps[i] = 'airbyte-cdk>=7.0.0,<8.0.0'
updated = True

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Handle Poetry dependency as a table (extras, markers)

Poetry often defines airbyte-cdk as a table: {version="^6", extras=[...]}. Could we update both string and table forms, wdyt?

-            if isinstance(poetry_deps, dict) and 'airbyte-cdk' in poetry_deps:
-                poetry_deps['airbyte-cdk'] = '^7'
-                updated = True
+            if isinstance(poetry_deps, dict) and 'airbyte-cdk' in poetry_deps:
+                dep = poetry_deps['airbyte-cdk']
+                if isinstance(dep, str):
+                    poetry_deps['airbyte-cdk'] = '^7'
+                    updated = True
+                elif isinstance(dep, dict):
+                    dep['version'] = '^7'
+                    updated = True
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Check poetry dependencies (dict format)
poetry_deps = pyproject_data.get('tool', {}).get('poetry', {}).get('dependencies', {})
if isinstance(poetry_deps, dict) and 'airbyte-cdk' in poetry_deps:
poetry_deps['airbyte-cdk'] = '^7'
updated = True
# Check PEP 621 dependencies (list format)
project_deps = pyproject_data.get('project', {}).get('dependencies', [])
if isinstance(project_deps, list):
for i, dep in enumerate(project_deps):
if isinstance(dep, str) and dep.startswith('airbyte-cdk'):
project_deps[i] = 'airbyte-cdk>=7.0.0,<8.0.0'
updated = True
# Check poetry dependencies (dict format)
poetry_deps = pyproject_data.get('tool', {}).get('poetry', {}).get('dependencies', {})
if isinstance(poetry_deps, dict) and 'airbyte-cdk' in poetry_deps:
dep = poetry_deps['airbyte-cdk']
if isinstance(dep, str):
poetry_deps['airbyte-cdk'] = '^7'
updated = True
elif isinstance(dep, dict):
dep['version'] = '^7'
updated = True
# Check PEP 621 dependencies (list format)
project_deps = pyproject_data.get('project', {}).get('dependencies', [])
if isinstance(project_deps, list):
for i, dep in enumerate(project_deps):
if isinstance(dep, str) and dep.startswith('airbyte-cdk'):
project_deps[i] = 'airbyte-cdk>=7.0.0,<8.0.0'
updated = True
🤖 Prompt for AI Agents
In bin/v7_migration.py around lines 449 to 462, the code only updates poetry
dependencies when the value is a string but Poetry often represents dependencies
as a table/dict (e.g., {version="^6", extras=[...]}); modify the logic so that
if poetry_deps['airbyte-cdk'] is a dict you update its 'version' field to '^7'
(or set 'version' = '^7' if missing) while preserving other keys like 'extras'
and markers, and still set updated = True; keep the existing behavior for string
values unchanged.

Comment on lines 647 to 666
def main():
parser = argparse.ArgumentParser(
description="Migrate Airbyte connectors to CDK version 7",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Migrate 10 connectors (default count)
python v7_migration.py --connectors-path /path/to/connectors

# Migrate 5 connectors
python v7_migration.py --connectors-path /path/to/connectors --count 5

# Migrate all eligible connectors (use a large number)
python v7_migration.py --connectors-path /path/to/connectors --count 1000

# Use relative path
python v7_migration.py --connectors-path ../connectors --count 3
"""
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

CLI help/examples mismatch the actual flag

Examples show --connectors-path, but the parser defines --airbyte-repo-path. Could we align examples to the real flag (and rename the help to “Path to the Airbyte monorepo root”), wdyt?

-        epilog="""
+        epilog="""
 Examples:
   # Migrate 10 connectors (default count)
-  python v7_migration.py --connectors-path /path/to/connectors
+  python v7_migration.py --airbyte-repo-path /path/to/airbyte
 ...
-  python v7_migration.py --connectors-path /path/to/connectors --count 5
+  python v7_migration.py --airbyte-repo-path /path/to/airbyte --count 5
 ...
-  python v7_migration.py --connectors-path /path/to/connectors --count 1000
+  python v7_migration.py --airbyte-repo-path /path/to/airbyte --count 1000
 ...
-  python v7_migration.py --connectors-path ../connectors --count 3
+  python v7_migration.py --airbyte-repo-path ../airbyte --count 3
         """
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def main():
parser = argparse.ArgumentParser(
description="Migrate Airbyte connectors to CDK version 7",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Migrate 10 connectors (default count)
python v7_migration.py --connectors-path /path/to/connectors
# Migrate 5 connectors
python v7_migration.py --connectors-path /path/to/connectors --count 5
# Migrate all eligible connectors (use a large number)
python v7_migration.py --connectors-path /path/to/connectors --count 1000
# Use relative path
python v7_migration.py --connectors-path ../connectors --count 3
"""
)
def main():
parser = argparse.ArgumentParser(
description="Migrate Airbyte connectors to CDK version 7",
formatter_class=argparse.RawDescriptionHelpFormatter,
epilog="""
Examples:
# Migrate 10 connectors (default count)
python v7_migration.py --airbyte-repo-path /path/to/airbyte
# Migrate 5 connectors
python v7_migration.py --airbyte-repo-path /path/to/airbyte --count 5
# Migrate all eligible connectors (use a large number)
python v7_migration.py --airbyte-repo-path /path/to/airbyte --count 1000
# Use relative path
python v7_migration.py --airbyte-repo-path ../airbyte --count 3
"""
)
# … rest of main() …
🤖 Prompt for AI Agents
In bin/v7_migration.py around lines 647 to 666, the CLI examples use
--connectors-path while the parser actually defines --airbyte-repo-path; update
the epilog examples to use --airbyte-repo-path (so examples match the actual
flag), and change the parser.add_argument help text for --airbyte-repo-path to
read "Path to the Airbyte monorepo root" to better describe the flag.

Copy link

github-actions bot commented Sep 4, 2025

PyTest Results (Fast)

3 742 tests   - 10   3 730 ✅  - 10   6m 22s ⏱️ -7s
    1 suites ± 0      12 💤 ± 0 
    1 files   ± 0       0 ❌ ± 0 

Results for commit 6dc3ebd. ± Comparison against base commit 5378cef.

This pull request removes 91 and adds 81 tests. Note that renamed tests count towards both.
unit_tests.manifest_server.test_auth.TestVerifyJwtToken ‑ test_empty_secret_allows_all_requests
unit_tests.manifest_server.test_auth.TestVerifyJwtToken ‑ test_expired_token_raises_401
unit_tests.manifest_server.test_auth.TestVerifyJwtToken ‑ test_invalid_token_raises_401
unit_tests.manifest_server.test_auth.TestVerifyJwtToken ‑ test_malformed_token_raises_401
unit_tests.manifest_server.test_auth.TestVerifyJwtToken ‑ test_missing_credentials_with_secret_raises_401
unit_tests.manifest_server.test_auth.TestVerifyJwtToken ‑ test_no_secret_allows_all_requests
unit_tests.manifest_server.test_auth.TestVerifyJwtToken ‑ test_token_without_required_claims_passes
unit_tests.manifest_server.test_auth.TestVerifyJwtToken ‑ test_valid_token_passes
unit_tests.manifest_server.test_auth.TestVerifyJwtToken ‑ test_wrong_secret_raises_401
unit_tests.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_latest_record_cursor_value_is_higher_than_slice_end-2021-01-01-stream_slice2-observed_records2-expected_state2]
…
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_latest_record_cursor_value_is_higher_than_slice_end-2021-01-01-stream_slice2-observed_records2-expected_state2]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_previous_cursor_is_highest-2023-01-01-stream_slice0-observed_records0-expected_state0]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_stream_slice_partition_end_is_highest-2020-01-01-stream_slice1-observed_records1-expected_state1]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_all_records_out_of_slice_and_no_previous_cursor-None-stream_slice9-observed_records9-expected_state9]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_all_records_out_of_slice_boundaries-2021-01-01-stream_slice8-observed_records8-expected_state8]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_no_records_observed-2021-01-01-stream_slice3-observed_records3-expected_state3]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_no_records_observed_and_no_previous_state-None-stream_slice4-observed_records4-expected_state4]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_out_of_order_records-2021-01-01-stream_slice6-observed_records6-expected_state6]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_with_some_records_out_of_slice_boundaries-2021-01-01-stream_slice7-observed_records7-expected_state7]
unit_tests.legacy.sources.declarative.incremental.test_datetime_based_cursor ‑ test_close_slice[test_close_slice_without_previous_cursor-None-stream_slice5-observed_records5-expected_state5]
…

♻️ This comment has been updated with latest results.

@maxi297 maxi297 changed the title v7 migration script chore: v7 migration script Sep 4, 2025
Copy link

github-actions bot commented Sep 4, 2025

PyTest Results (Full)

3 745 tests  +3 745   3 733 ✅ +3 733   11m 0s ⏱️ + 11m 0s
    1 suites +    1      12 💤 +   12 
    1 files   +    1       0 ❌ ±    0 

Results for commit 6dc3ebd. ± Comparison against base commit 5378cef.

♻️ This comment has been updated with latest results.

@github-actions github-actions bot added the chore label Sep 4, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (8)
bin/v7_migration.py (8)

13-21: Fix Ruff I001: group and sort imports (stdlib vs third‑party).

This is currently failing CI. Shall we reorder and group to satisfy Ruff/isort, wdyt?

-import argparse
-import re
-import subprocess
-from datetime import datetime
-from pathlib import Path
-from typing import Dict, List, Optional
-
-import toml
-import yaml
+import argparse
+import re
+import subprocess
+from datetime import datetime
+from pathlib import Path
+from typing import Dict, List, Optional
+
+import toml
+import yaml

678-690: Align CLI examples with the actual flag name.

Examples use --connectors-path but the parser defines --airbyte-repo-path. Shall we update the epilog to match, wdyt?

-  python v7_migration.py --connectors-path /path/to/connectors
+  python v7_migration.py --airbyte-repo-path /path/to/airbyte
@@
-  python v7_migration.py --connectors-path /path/to/connectors --count 5
+  python v7_migration.py --airbyte-repo-path /path/to/airbyte --count 5
@@
-  python v7_migration.py --connectors-path /path/to/connectors --count 1000
+  python v7_migration.py --airbyte-repo-path /path/to/airbyte --count 1000
@@
-  python v7_migration.py --connectors-path ../connectors --count 3
+  python v7_migration.py --airbyte-repo-path ../airbyte --count 3

354-381: Don’t default to writing an invalid image digest (“”); gate the update behind a real value.

Using "" risks committing a broken baseImage. Can we make the digest optional, skip the declarative update when absent, and add a CLI flag to pass it, wdyt?

-def migrate_to_cdk_v7(self, sha256_hash: str = "<TBD>") -> bool:
+def migrate_to_cdk_v7(self, sha256_hash: Optional[str] = None) -> bool:
@@
-        if self._uses_declarative_manifest():
-            success = self._migrate_declarative_manifest_to_v7(sha256_hash)
+        if self._uses_declarative_manifest():
+            if not sha256_hash:
+                print(f"Skipping declarative base image update for {self.name}: missing --manifest-image-digest")
+            else:
+                success = self._migrate_declarative_manifest_to_v7(sha256_hash) or success

Add a CLI flag and plumb it through main (see separate comment below). Want me to wire this up end-to-end?


399-437: Avoid regex-editing YAML; update the structured field and verify the change occurred.

Regex can hit the wrong key and returns success even if no replacement happens. Shall we load metadata.yaml, set data.connectorBuildOptions.baseImage, and dump back (preserving keys), and only return True when the value actually changed, wdyt?

-            # Load the current metadata
-            with open(metadata_file, "r") as f:
-                metadata_content = f.read()
-
-            # Update the baseImage to version 7
-            new_base_image = (
-                f"docker.io/airbyte/source-declarative-manifest:7.0.0@sha256:{sha256_hash}"
-            )
-
-            # Replace the base image using regex to preserve formatting
-            base_image_pattern = r"(baseImage:\s*)[^\n\r]+"
-            replacement = rf"\g<1>{new_base_image}"
-
-            updated_content = re.sub(base_image_pattern, replacement, metadata_content)
-
-            # Write back the updated metadata
-            with open(metadata_file, "w") as f:
-                f.write(updated_content)
+            with open(metadata_file, "r", encoding="utf-8") as f:
+                meta = yaml.safe_load(f) or {}
+            new_base_image = f"docker.io/airbyte/source-declarative-manifest:7.0.0@sha256:{sha256_hash}"
+            data = meta.setdefault("data", {})
+            cbo = data.setdefault("connectorBuildOptions", {})
+            old = cbo.get("baseImage")
+            cbo["baseImage"] = new_base_image
+            if old == new_base_image:
+                print(f"{self.name}: baseImage already at CDK v7")
+                return True
+            with open(metadata_file, "w", encoding="utf-8") as f:
+                yaml.safe_dump(meta, f, sort_keys=False)

If you’d like better formatting preservation, I can switch to ruamel.yaml, wdyt?


465-477: Handle Poetry’s dict/table dependency form for airbyte-cdk.

Poetry often declares dependencies as tables (extras, markers). Can we support both string and dict forms to avoid missing upgrades, wdyt?

-            poetry_deps = pyproject_data.get("tool", {}).get("poetry", {}).get("dependencies", {})
-            if isinstance(poetry_deps, dict) and "airbyte-cdk" in poetry_deps:
-                poetry_deps["airbyte-cdk"] = "^7"
-                updated = True
+            poetry_deps = pyproject_data.get("tool", {}).get("poetry", {}).get("dependencies", {})
+            if isinstance(poetry_deps, dict) and "airbyte-cdk" in poetry_deps:
+                dep = poetry_deps["airbyte-cdk"]
+                if isinstance(dep, str):
+                    poetry_deps["airbyte-cdk"] = "^7"
+                    updated = True
+                elif isinstance(dep, dict):
+                    dep["version"] = "^7"
+                    updated = True

693-705: Make --airbyte-repo-path required and validate connectors directory exists.

Path(None) will crash; also we should verify <repo>/airbyte-integrations/connectors is present. Shall we make the arg required and add clear checks, wdyt?

-    parser.add_argument(
-        "--airbyte-repo-path",
-        type=str,
-        default=None,
-        help="Path to the connectors directory (required)",
-    )
+    parser.add_argument(
+        "--airbyte-repo-path",
+        type=Path,
+        required=True,
+        help="Path to the Airbyte monorepo root (containing airbyte-integrations/)",
+    )
@@
-    airbyte_repo_path = Path(args.airbyte_repo_path).resolve()
-    if not airbyte_repo_path.exists():
-        print(f"Error: airbyte_repo_path path does not exist: {airbyte_repo_path}")
+    airbyte_repo_path: Path = args.airbyte_repo_path.resolve()
+    if not airbyte_repo_path.exists():
+        print(f"Error: repo path does not exist: {airbyte_repo_path}")
         return 1
@@
-    print(f"Using connectors path: {airbyte_repo_path}")
+    print(f"Using repo path: {airbyte_repo_path}")
@@
-        for connector_path in get_connector_directories(
-            airbyte_repo_path / "airbyte-integrations" / "connectors"
-        )
+        for connector_path in get_connector_directories(
+            airbyte_repo_path / "airbyte-integrations" / "connectors"
+        )
     ]
+    connectors_root = airbyte_repo_path / "airbyte-integrations" / "connectors"
+    if not connectors_root.exists():
+        print(f"Error: connectors directory not found at: {connectors_root}")
+        return 1

Also applies to: 709-721


700-705: Add a CLI flag for the manifest base image digest and plumb it through.

This pairs with the earlier change to avoid defaulting to "". Shall we add the flag and pass it to migrate_to_cdk_v7, wdyt?

     parser.add_argument(
         "--count",
         type=int,
         default=10,
         help="Number of eligible connectors to migrate (default: 10)",
     )
+    parser.add_argument(
+        "--manifest-image-digest",
+        type=str,
+        default=None,
+        help="sha256 for docker.io/airbyte/source-declarative-manifest:7.0.0 (required to update declarative base image)",
+    )
@@
-        eligible_for_migration[i].migrate_to_cdk_v7()
+        eligible_for_migration[i].migrate_to_cdk_v7(sha256_hash=args.manifest_image_digest)

Also applies to: 734-737


742-743: Propagate the exit code from main.

Right now failures return 0. Shall we exit with main()’s return value, wdyt?

-if __name__ == "__main__":
-    main()
+if __name__ == "__main__":
+    raise SystemExit(main())
🧹 Nitpick comments (8)
bin/v7_migration.py (8)

471-477: Also update PEP 621 extras and preserve extras when present.

Project-level deps can include extras like "airbyte-cdk[singer]>=6,<7" and optional-dependencies. Shall we preserve extras and handle optional-dependencies, wdyt?

-            project_deps = pyproject_data.get("project", {}).get("dependencies", [])
+            project_table = pyproject_data.get("project", {})
+            project_deps = project_table.get("dependencies", [])
             if isinstance(project_deps, list):
                 for i, dep in enumerate(project_deps):
-                    if isinstance(dep, str) and dep.startswith("airbyte-cdk"):
-                        project_deps[i] = "airbyte-cdk>=7.0.0,<8.0.0"
+                    if isinstance(dep, str) and dep.startswith("airbyte-cdk"):
+                        # Preserve extras if present, e.g. airbyte-cdk[extra]
+                        m = re.match(r"^(airbyte-cdk(?:\[[^\]]+\])?).*$", dep)
+                        if m:
+                            project_deps[i] = f"{m.group(1)}>=7.0.0,<8.0.0"
                         updated = True
+            # Optional dependencies (extras)
+            opt_deps = project_table.get("optional-dependencies", {})
+            if isinstance(opt_deps, dict):
+                for extra, deps in opt_deps.items():
+                    if isinstance(deps, list):
+                        for i, dep in enumerate(deps):
+                            if isinstance(dep, str) and dep.startswith("airbyte-cdk"):
+                                m = re.match(r"^(airbyte-cdk(?:\[[^\]]+\])?).*$", dep)
+                                if m:
+                                    deps[i] = f"{m.group(1)}>=7.0.0,<8.0.0"
+                                    updated = True

651-654: Docstring says “source- and destination-*” but code returns only sources; also sort for determinism.*

Shall we fix the docstring and sort directories for stable runs, wdyt?

-def get_connector_directories(connectors_path: Path) -> List[Path]:
+def get_connector_directories(connectors_path: Path) -> List[Path]:
@@
-    Get all connector directories (source-* and destination-*).
+    Get all source connector directories (source-*).
@@
-    for item in connectors_path.iterdir():
+    for item in connectors_path.iterdir():
         if item.is_dir() and item.name.startswith("source-"):
             connector_dirs.append(item)
 
-    return connector_dirs
+    connector_dirs.sort(key=lambda p: p.name)
+    return connector_dirs

Also applies to: 666-671


142-175: Broaden detection of “unbounded v6” specs in pyproject.

You only match caret (“^6”). Some connectors use “>=6,<7” or “~6.x”. Should we include these patterns to avoid missing candidates, wdyt?

-                    if isinstance(cdk_version, str) and "^6" in cdk_version:
+                    if isinstance(cdk_version, str) and re.search(r"(\^6|\b>=\s*6(\.\d+)*\s*,\s*<\s*7\b|~=\s*6(\.\d+)*)", cdk_version):
                         return True
@@
-                            if "^6" in dep:
+                            if re.search(r"(\^6|\b>=\s*6(\.\d+)*\s*,\s*<\s*7\b|~=\s*6(\.\d+)*)", dep):
                                 return True

If you prefer robust parsing, I can switch this to use packaging.specifiers, wdyt?


47-51: Use explicit UTF‑8 encoding for file IO.

For cross‑platform consistency, shall we add encoding="utf-8" to reads/writes, wdyt?

Example changes:

-with open(metadata_file, "r") as f:
+with open(metadata_file, "r", encoding="utf-8") as f:

-with open(manifest_file, "r") as f:
+with open(manifest_file, "r", encoding="utf-8") as f:

-with open(metadata_file, "w") as f:
+with open(metadata_file, "w", encoding="utf-8") as f:

-with open(pyproject_file, "r") as f:
+with open(pyproject_file, "r", encoding="utf-8") as f:

-with open(pyproject_file, "w") as f:
+with open(pyproject_file, "w", encoding="utf-8") as f:

-with open(changelog_file, "r", encoding="utf-8") as f:
+with open(changelog_file, "r", encoding="utf-8") as f:

-with open(changelog_file, "w", encoding="utf-8") as f:
+with open(changelog_file, "w", encoding="utf-8") as f:

Also applies to: 85-90, 431-433, 459-461, 479-482, 563-565, 594-596


311-344: Reduce false positives when scanning deprecated imports by using AST.

String matching can miss aliased imports and multi-line forms. Should we parse Python files with ast and look for import nodes referencing the target names/modules, wdyt?

I can replace the loop with an ast-based check that safely handles aliases and multiline imports while ignoring comments. Want me to draft that?


723-733: Variable naming nit: v6_declarative_sources includes Python connectors too.

To avoid confusion, shall we rename to v6_connectors (or similar), wdyt?

-    v6_declarative_sources = [c for c in connectors if c.is_using_unbounded_cdk_version_6()]
-    eligible_for_migration = [c for c in v6_declarative_sources if c.is_eligible_for_migration()]
-    non_eligible_for_migration = [
-        c for c in v6_declarative_sources if not c.is_eligible_for_migration()
+    v6_connectors = [c for c in connectors if c.is_using_unbounded_cdk_version_6()]
+    eligible_for_migration = [c for c in v6_connectors if c.is_eligible_for_migration()]
+    non_eligible_for_migration = [
+        c for c in v6_connectors if not c.is_eligible_for_migration()
     ]
@@
-        f"{len(eligible_for_migration)} sources eligible for automatic migration out of {len(v6_declarative_sources)} sources using CDK version 6"
+        f"{len(eligible_for_migration)} sources eligible for automatic migration out of {len(v6_connectors)} sources using CDK version 6"

3-12: CLI/docs polish: remove TODO and add digest flag to “How to use.”

Since we’ll add --manifest-image-digest, shall we update this header and drop the TODO, wdyt?

-# * Have a venv with yaml and toml
+# * Have a venv with yaml and toml installed
@@
-# * Find and replace `[TBD](https://github.com/airbytehq/airbyte/pull/TBD)`
+# * Pass --manifest-image-digest for declarative sources; update PR link in changelog as needed
@@
-#  * Set the hash for the source-declarative-manifest docker image (hardcoded as "<TBD>" for now)
+#  * Provide the hash for the source-declarative-manifest docker image via --manifest-image-digest

716-721: Make migration order deterministic and consider skipping known exceptions via a denylist.

For reproducibility, shall we sort eligible_for_migration by name and optionally accept a --skip list or a built-in denylist for the connectors called out in the PR description, wdyt?

-    connectors = [
+    connectors = [
         Connector(connector_path)
         for connector_path in get_connector_directories(
             airbyte_repo_path / "airbyte-integrations" / "connectors"
         )
     ]
+    connectors.sort(key=lambda c: c.name)
@@
-    for i in range(count_to_migrate):
-        eligible_for_migration[i].migrate_to_cdk_v7(sha256_hash=args.manifest_image_digest)
+    for connector in sorted(eligible_for_migration, key=lambda c: c.name)[:count_to_migrate]:
+        connector.migrate_to_cdk_v7(sha256_hash=args.manifest_image_digest)

If you want, I can add a --skip arg and a small built-in denylist for the “custom” sources mentioned, wdyt?

Also applies to: 734-739

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 13d395f and 734139a.

📒 Files selected for processing (1)
  • bin/v7_migration.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
bin/v7_migration.py (1)
unit_tests/source_declarative_manifest/conftest.py (1)
  • components_file (87-103)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-shopify
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (5)
bin/v7_migration.py (5)

799-800: Propagate exit code

main() returns a code but the process always exits 0. Shall we exit with SystemExit(main()), wdyt?

-if __name__ == "__main__":
-    main()
+if __name__ == "__main__":
+    raise SystemExit(main())

410-427: Avoid regex-editing YAML and remove hardcoded digest

Editing metadata.yaml via regex is brittle, and pinning a digest in code risks staleness. Shall we (1) update data.connectorBuildOptions.baseImage via YAML and (2) pass the digest from a CLI flag, skipping the base image update if not provided, wdyt?

-    def _migrate_declarative_manifest_to_v7(self) -> bool:
+    def _migrate_declarative_manifest_to_v7(self, sha256_digest: Optional[str]) -> bool:
@@
-            # Load the current metadata
-            with open(metadata_file, "r") as f:
-                metadata_content = f.read()
-
-            # Update the baseImage to version 7
-            new_base_image = (
-                f"docker.io/airbyte/source-declarative-manifest:7.0.1@sha256:ff1e701c8f913cf24a0220f62c8e64cc1c3011ba0a636985f4db47fdab1391b6"
-            )
-
-            # Replace the base image using regex to preserve formatting
-            base_image_pattern = r"(baseImage:\s*)[^\n\r]+"
-            replacement = rf"\g<1>{new_base_image}"
-
-            updated_content = re.sub(base_image_pattern, replacement, metadata_content)
-
-            # Write back the updated metadata with the new base image
-            with open(metadata_file, "w") as f:
-                f.write(updated_content)
+            with open(metadata_file, "r", encoding="utf-8") as f:
+                meta = yaml.safe_load(f) or {}
+            if not sha256_digest:
+                print(f"Skipping baseImage update for {self.name}: missing --manifest-image-digest")
+            else:
+                new_base_image = f"docker.io/airbyte/source-declarative-manifest:7.0.1@sha256:{sha256_digest}"
+                meta.setdefault("data", {}).setdefault("connectorBuildOptions", {})["baseImage"] = new_base_image
+                with open(metadata_file, "w", encoding="utf-8") as f:
+                    yaml.safe_dump(meta, f, sort_keys=False)

465-477: Preserve Poetry tables and PEP 621 extras/markers when bumping to v7

For Poetry, airbyte-cdk can be a table; for PEP 621, bumping by string replace drops extras/markers. Shall we keep extras/markers intact using packaging.Requirement, wdyt?

-            poetry_deps = pyproject_data.get("tool", {}).get("poetry", {}).get("dependencies", {})
-            if isinstance(poetry_deps, dict) and "airbyte-cdk" in poetry_deps:
-                poetry_deps["airbyte-cdk"] = "^7"
-                updated = True
+            poetry_deps = pyproject_data.get("tool", {}).get("poetry", {}).get("dependencies", {})
+            if isinstance(poetry_deps, dict) and "airbyte-cdk" in poetry_deps:
+                dep = poetry_deps["airbyte-cdk"]
+                if isinstance(dep, str):
+                    poetry_deps["airbyte-cdk"] = "^7"
+                elif isinstance(dep, dict):
+                    dep["version"] = "^7"
+                updated = True
@@
-            if isinstance(project_deps, list):
-                for i, dep in enumerate(project_deps):
-                    if isinstance(dep, str) and dep.startswith("airbyte-cdk"):
-                        project_deps[i] = "airbyte-cdk>=7.0.0,<8.0.0"
-                        updated = True
+            if isinstance(project_deps, list):
+                for i, dep in enumerate(project_deps):
+                    if isinstance(dep, str):
+                        try:
+                            req = Requirement(dep)
+                        except Exception:
+                            continue
+                        if req.name == "airbyte-cdk":
+                            req.specifier = SpecifierSet(">=7.0.0,<8.0.0")
+                            project_deps[i] = str(req)
+                            updated = True

Also applies to: 471-477


731-747: CLI polish: examples mismatch flag, arg should be required, and validate connectors dir

Examples use --connectors-path but the parser defines --airbyte-repo-path. Also --airbyte-repo-path is optional yet immediately used, and we don’t verify <repo>/airbyte-integrations/connectors exists. Shall we align examples, make the arg required, validate the connectors dir, and add a --manifest-image-digest flag, wdyt?

     parser = argparse.ArgumentParser(
         description="Migrate Airbyte connectors to CDK version 7",
         formatter_class=argparse.RawDescriptionHelpFormatter,
         epilog="""
 Examples:
   # Migrate 10 connectors (default count)
-  python v7_migration.py --connectors-path /path/to/connectors
+  python v7_migration.py --airbyte-repo-path /path/to/airbyte
@@
   # Migrate 5 connectors
-  python v7_migration.py --connectors-path /path/to/connectors --count 5
+  python v7_migration.py --airbyte-repo-path /path/to/airbyte --count 5
@@
   # Migrate all eligible connectors (use a large number)
-  python v7_migration.py --connectors-path /path/to/connectors --count 1000
+  python v7_migration.py --airbyte-repo-path /path/to/airbyte --count 1000
@@
   # Use relative path
-  python v7_migration.py --connectors-path ../connectors --count 3
+  python v7_migration.py --airbyte-repo-path ../airbyte --count 3
         """,
     )
@@
-    parser.add_argument(
-        "--airbyte-repo-path",
-        type=str,
-        default=None,
-        help="Path to the connectors directory (required)",
-    )
+    parser.add_argument(
+        "--airbyte-repo-path",
+        type=Path,
+        required=True,
+        help="Path to the Airbyte monorepo root (containing airbyte-integrations/)",
+    )
+    parser.add_argument(
+        "--manifest-image-digest",
+        type=str,
+        default=None,
+        help="sha256 digest for docker.io/airbyte/source-declarative-manifest:7.0.1 (optional; skips baseImage update if omitted)",
+    )
@@
-    airbyte_repo_path = Path(args.airbyte_repo_path).resolve()
-    if not airbyte_repo_path.exists():
-        print(f"Error: airbyte_repo_path path does not exist: {airbyte_repo_path}")
+    airbyte_repo_path: Path = args.airbyte_repo_path.resolve()
+    if not airbyte_repo_path.exists():
+        print(f"Error: repo path does not exist: {airbyte_repo_path}")
         return 1
@@
-    print(f"Using connectors path: {airbyte_repo_path}")
+    print(f"Using repo path: {airbyte_repo_path}")
@@
-    connectors = [
-        Connector(connector_path)
-        for connector_path in get_connector_directories(
-            airbyte_repo_path / "airbyte-integrations" / "connectors"
-        )
-    ]
+    connectors_root = airbyte_repo_path / "airbyte-integrations" / "connectors"
+    if not connectors_root.exists():
+        print(f"Error: connectors directory not found at: {connectors_root}")
+        return 1
+    connectors = [Connector(p) for p in get_connector_directories(connectors_root)]
@@
-        eligible_for_migration[i].migrate_to_cdk_v7()
+        eligible_for_migration[i].migrate_to_cdk_v7(sha256_hash=args.manifest_image_digest)

Also applies to: 751-756, 766-769, 773-778, 791-795


352-376: Thread the digest from CLI and call the updated helper

If we add a digest flag, we’ll need to thread it through. Shall we extend the signature and call site, wdyt?

-    def migrate_to_cdk_v7(self) -> bool:
+    def migrate_to_cdk_v7(self, sha256_hash: Optional[str] = None) -> bool:
@@
-        if self._uses_declarative_manifest():
-            success = self._migrate_declarative_manifest_to_v7()
+        if self._uses_declarative_manifest():
+            success = self._migrate_declarative_manifest_to_v7(sha256_hash)
🧹 Nitpick comments (5)
bin/v7_migration.py (5)

589-656: Keep the CDK version label consistent in changelog vs base image

Changelog says “v7.0.0” but base image uses “7.0.1”. Would you prefer “Update to CDK v7” to avoid drift, or centralize the version in a constant used in both places, wdyt?

-            new_entry = f"| {next_version} | {current_date} | [TBD](https://github.com/airbytehq/airbyte/pull/TBD) | Update to CDK v7.0.0 |"
+            new_entry = f"| {next_version} | {current_date} | [TBD](https://github.com/airbytehq/airbyte/pull/TBD) | Update to CDK v7 |"

Also applies to: 630-632, 416-417


50-92: Return None instead of empty string for missing manifest (aligns with Optional[str])

The property advertises Optional[str] but caches empty string on miss. Shall we return None to match the type and simplify truthiness checks, wdyt?

-        # No manifest found in any location
-        self._manifest_content = ""
-        return self._manifest_content
+        # No manifest found in any location
+        self._manifest_content = None
+        return self._manifest_content

44-46: Specify UTF-8 encoding for file I/O

Several reads/writes omit encoding="utf-8". Adding it avoids locale-dependent bugs. Shall we standardize on UTF-8 across the script, wdyt?

-            with open(metadata_file, "r") as f:
+            with open(metadata_file, "r", encoding="utf-8") as f:
                 return yaml.safe_load(f)
@@
-            with open(metadata_file, "r") as f:
+            with open(metadata_file, "r", encoding="utf-8") as f:
                 metadata_content = f.read()
@@
-            with open(pyproject_file, "r") as f:
+            with open(pyproject_file, "r", encoding="utf-8") as f:
                 pyproject_data = toml.load(f)
@@
-                with open(pyproject_file, "w") as f:
+                with open(pyproject_file, "w", encoding="utf-8") as f:
                     # WARNING: using toml to rewrite stuff can lead to changing the order/the formatting on some things
                     toml.dump(pyproject_data, f)
@@
-            with open(metadata_file, "r") as f:
+            with open(metadata_file, "r", encoding="utf-8") as f:
                 metadata_content = f.read()
@@
-                with open(metadata_file, "w") as f:
+                with open(metadata_file, "w", encoding="utf-8") as f:
                     f.write(updated_content)
@@
-            with open(changelog_file, "w", encoding="utf-8") as f:
+            with open(changelog_file, "w", encoding="utf-8") as f:
                 f.write(updated_content)

Also applies to: 411-427, 460-483, 522-541, 617-653


709-716: Docstring says “source- and destination-*” but code filters only sources*

Either update the docstring to “source-* only” or include destinations in the filter. Which do you prefer, wdyt?

-    Get all connector directories (source-* and destination-*).
+    Get all source connector directories (source-*).

771-771: Log wording nit: repo vs connectors path

This prints the repo root, not a connectors path. Shall we reword to “Using repo path”, wdyt?

-    print(f"Using connectors path: {airbyte_repo_path}")
+    print(f"Using repo path: {airbyte_repo_path}")
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 734139a and c63f9b6.

📒 Files selected for processing (1)
  • bin/v7_migration.py (1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Linters
bin/v7_migration.py

[error] 357-363: Ruff format check failed. 1 file would be reformatted by 'ruff format --diff'. Run 'ruff format' to apply formatting changes. Command: poetry run ruff format --diff .


[error] 412-420: Ruff format check failed. 1 file would be reformatted by 'ruff format --diff'. Run 'ruff format' to apply formatting changes. Command: poetry run ruff format --diff .


[error] 525-545: Ruff format check failed. 1 file would be reformatted by 'ruff format --diff'. Run 'ruff format' to apply formatting changes. Command: poetry run ruff format --diff .

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-intercom
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
🔇 Additional comments (1)
bin/v7_migration.py (1)

357-363: Run ruff format to apply formatting fixes
CI is currently flagging formatting errors in the specified ranges—can you run ruff format . (or target bin/v7_migration.py) and commit the resulting diff to unblock CI? wdyt?

Comment on lines +140 to +172
pyproject_file = self.path / "pyproject.toml"
if pyproject_file.exists():
try:
with open(pyproject_file, "r") as f:
pyproject_data = toml.load(f)

# Check dependencies in different sections
dependencies_sections = [
pyproject_data.get("tool", {}).get("poetry", {}).get("dependencies", {}),
pyproject_data.get("project", {}).get("dependencies", []),
]

# Check poetry dependencies (dict format)
poetry_deps = dependencies_sections[0]
if isinstance(poetry_deps, dict) and "airbyte-cdk" in poetry_deps:
cdk_version = poetry_deps["airbyte-cdk"]
# Check if the version specification contains a caret sign
if isinstance(cdk_version, str) and "^6" in cdk_version:
return True

# Check PEP 621 dependencies (list format)
project_deps = dependencies_sections[1]
if isinstance(project_deps, list):
for dep in project_deps:
if isinstance(dep, str) and dep.startswith("airbyte-cdk"):
# Check if the dependency specification contains a caret sign
if "^6" in dep:
return True

except Exception as e:
print(f"Error reading pyproject.toml for {self.name}: {e}")

return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Broaden v6 detection for PEP 621 deps and Poetry table deps

The PEP 621 path only checks for the literal “^6”, which won’t occur (PEP 621 uses PEP 440 specifiers). Also Poetry often defines airbyte-cdk as a table. Could we (a) handle Poetry dicts and (b) parse PEP 621 strings with packaging to detect “>=6,<7” ranges, wdyt?

+from packaging.requirements import Requirement
+from packaging.specifiers import SpecifierSet
@@
-                poetry_deps = dependencies_sections[0]
-                if isinstance(poetry_deps, dict) and "airbyte-cdk" in poetry_deps:
-                    cdk_version = poetry_deps["airbyte-cdk"]
-                    # Check if the version specification contains a caret sign
-                    if isinstance(cdk_version, str) and "^6" in cdk_version:
-                        return True
+                poetry_deps = dependencies_sections[0]
+                if isinstance(poetry_deps, dict) and "airbyte-cdk" in poetry_deps:
+                    dep = poetry_deps["airbyte-cdk"]
+                    if isinstance(dep, str) and dep.strip().startswith("^6"):
+                        return True
+                    if isinstance(dep, dict):
+                        ver = str(dep.get("version", "")).strip()
+                        if ver.startswith("^6"):
+                            return True
@@
-                project_deps = dependencies_sections[1]
-                if isinstance(project_deps, list):
-                    for dep in project_deps:
-                        if isinstance(dep, str) and dep.startswith("airbyte-cdk"):
-                            # Check if the dependency specification contains a caret sign
-                            if "^6" in dep:
-                                return True
+                project_deps = dependencies_sections[1]
+                if isinstance(project_deps, list):
+                    for dep in project_deps:
+                        if isinstance(dep, str):
+                            try:
+                                req = Requirement(dep)
+                            except Exception:
+                                continue
+                            if req.name == "airbyte-cdk":
+                                # Treat as v6 if spec includes 6.x but excludes 7.x
+                                spec: SpecifierSet = req.specifier
+                                if spec.contains("6.0.0", prereleases=True) and not spec.contains("7.0.0", prereleases=True):
+                                    return True
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pyproject_file = self.path / "pyproject.toml"
if pyproject_file.exists():
try:
with open(pyproject_file, "r") as f:
pyproject_data = toml.load(f)
# Check dependencies in different sections
dependencies_sections = [
pyproject_data.get("tool", {}).get("poetry", {}).get("dependencies", {}),
pyproject_data.get("project", {}).get("dependencies", []),
]
# Check poetry dependencies (dict format)
poetry_deps = dependencies_sections[0]
if isinstance(poetry_deps, dict) and "airbyte-cdk" in poetry_deps:
cdk_version = poetry_deps["airbyte-cdk"]
# Check if the version specification contains a caret sign
if isinstance(cdk_version, str) and "^6" in cdk_version:
return True
# Check PEP 621 dependencies (list format)
project_deps = dependencies_sections[1]
if isinstance(project_deps, list):
for dep in project_deps:
if isinstance(dep, str) and dep.startswith("airbyte-cdk"):
# Check if the dependency specification contains a caret sign
if "^6" in dep:
return True
except Exception as e:
print(f"Error reading pyproject.toml for {self.name}: {e}")
return False
# At the top of bin/v7_migration.py, add these imports:
from packaging.requirements import Requirement
from packaging.specifiers import SpecifierSet
pyproject_file = self.path / "pyproject.toml"
if pyproject_file.exists():
try:
with open(pyproject_file, "r") as f:
pyproject_data = toml.load(f)
# Check dependencies in different sections
dependencies_sections = [
pyproject_data.get("tool", {}).get("poetry", {}).get("dependencies", {}),
pyproject_data.get("project", {}).get("dependencies", []),
]
# Check poetry dependencies (dict format)
poetry_deps = dependencies_sections[0]
if isinstance(poetry_deps, dict) and "airbyte-cdk" in poetry_deps:
dep = poetry_deps["airbyte-cdk"]
if isinstance(dep, str) and dep.strip().startswith("^6"):
return True
if isinstance(dep, dict):
ver = str(dep.get("version", "")).strip()
if ver.startswith("^6"):
return True
# Check PEP 621 dependencies (list format)
project_deps = dependencies_sections[1]
if isinstance(project_deps, list):
for dep in project_deps:
if isinstance(dep, str):
try:
req = Requirement(dep)
except Exception:
continue
if req.name == "airbyte-cdk":
# Treat as v6 if spec includes 6.x but excludes 7.x
spec: SpecifierSet = req.specifier
if spec.contains("6.0.0", prereleases=True) and not spec.contains("7.0.0", prereleases=True):
return True
except Exception as e:
print(f"Error reading pyproject.toml for {self.name}: {e}")
return False
🤖 Prompt for AI Agents
In bin/v7_migration.py around lines 140-172, the current logic only detects a
literal "^6" in Poetry string deps and PEP 621 list strings, missing Poetry
table entries and PEP 621 table or PEP 440 specifiers like ">=6,<7"; update
detection to: treat Poetry dependencies values that are dicts/tables (e.g., {
"version": ">=6,<7" } or nested tables) by extracting their version field and
also handle PEP 621 entries that can be either strings or tables (dicts) by
normalizing to a version specifier string; then use packaging
(packaging.requirements.Requirement or packaging.specifiers.SpecifierSet) to
parse the requirement/specifier and return True when the specifier allows any
6.x version (e.g., spec intersects the range >=6,<7) or when a caret-style
string beginning with ^6 is present—ensure exceptions are caught and logged as
before.

Comment on lines +343 to +351
def is_eligible_for_migration(self) -> bool:
return (
not self.has_custom_incremental_sync()
and not self.has_custom_retriever()
and not self.has_custom_partition_router()
and not self.imports_deprecated_class()
and not self.has_pyproject_toml()
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Python connectors are excluded from eligibility by mistake

is_eligible_for_migration() returns False for any connector with a pyproject.toml, so Python sources detected in is_using_unbounded_cdk_version_6() can never be migrated. Shall we allow Python connectors that don’t use custom components or deprecated classes, wdyt?

 def is_eligible_for_migration(self) -> bool:
-        return (
-            not self.has_custom_incremental_sync()
-            and not self.has_custom_retriever()
-            and not self.has_custom_partition_router()
-            and not self.imports_deprecated_class()
-            and not self.has_pyproject_toml()
-        )
+        return (
+            not self.has_custom_incremental_sync()
+            and not self.has_custom_retriever()
+            and not self.has_custom_partition_router()
+            and not self.imports_deprecated_class()
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def is_eligible_for_migration(self) -> bool:
return (
not self.has_custom_incremental_sync()
and not self.has_custom_retriever()
and not self.has_custom_partition_router()
and not self.imports_deprecated_class()
and not self.has_pyproject_toml()
)
def is_eligible_for_migration(self) -> bool:
return (
not self.has_custom_incremental_sync()
and not self.has_custom_retriever()
and not self.has_custom_partition_router()
and not self.imports_deprecated_class()
)
🤖 Prompt for AI Agents
In bin/v7_migration.py around lines 343 to 351, the current eligibility check
incorrectly excludes any connector with a pyproject.toml (preventing Python
connectors from being migrated); remove the "and not self.has_pyproject_toml()"
clause from the returned boolean expression so Python-based connectors that do
not use custom incremental syncs, custom retrievers, custom partition routers,
or deprecated classes remain eligible for migration.

@maxi297
Copy link
Contributor Author

maxi297 commented Sep 8, 2025

/autofix

Auto-Fix Job Info

This job attempts to auto-fix any linting or formating issues. If any fixes are made,
those changes will be automatically committed and pushed back to the PR.

Note: This job can only be run by maintainers. On PRs from forks, this command requires
that the PR author has enabled the Allow edits from maintainers option.

PR auto-fix job started... Check job output.

✅ Changes applied successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant