-
Notifications
You must be signed in to change notification settings - Fork 30
chore: v7 migration script #747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@maxi297/cdk7_migration_script#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch maxi297/cdk7_migration_script Helpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
/autofix |
Caution Review failedAn error occurred during the review process. Please try again later. ✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Nitpick comments (8)
bin/v7_migration.py (8)
456-462
: Broaden PEP 621 dependency detection
startswith('airbyte-cdk')
misses common forms (extras and varied specifiers). Would a regex handleairbyte-cdk
, optional extras, and any comparator, wdyt?- for i, dep in enumerate(project_deps): - if isinstance(dep, str) and dep.startswith('airbyte-cdk'): - project_deps[i] = 'airbyte-cdk>=7.0.0,<8.0.0' + version_rx = re.compile(r'^airbyte-cdk(\[.*\])?\s*[~<>=!].*') + for i, dep in enumerate(project_deps): + if isinstance(dep, str) and version_rx.match(dep): + # Preserve extras if any + extras = re.match(r'^airbyte-cdk(\[.*\])?', dep).group(1) or '' + project_deps[i] = f'airbyte-cdk{extras}>=7.0.0,<8.0.0' updated = True
108-174
: Detect more v6 specs in pyprojectCurrently only caret
^6
is caught. Many connectors use ranges like>=6,<7
or pinned==6.*
. Shall we detect these too to widen coverage, wdyt?- if isinstance(cdk_version, str) and '^6' in cdk_version: + if isinstance(cdk_version, str) and re.search(r'(\^6|>=\s*6(\.\d+)*\s*,\s*<\s*7|==\s*6(\.\d+)*)', cdk_version): return TrueAnd similarly for PEP 621 list entries:
- if '^6' in dep: + if re.search(r'(\^6|>=\s*6(\.\d+)*\s*,\s*<\s*7|==\s*6(\.\d+)*)', dep): return True
519-537
: Changelog parsing is fragile; tie version extraction to the table
_extract_current_version
finds the first| x.y.z |
anywhere, which might not be the changelog’s top entry. Shall we parse the table header and then read the next data row to extract the current version, wdyt?- current_version = self._extract_current_version(content) + current_version = self._extract_current_version(content) if not current_version: print(f"Could not determine current version for {self.name}") return FalseOutside these ranges, consider replacing
_extract_current_version
with:def _extract_current_version(self, content: str) -> Optional[str]: table_rx = re.compile(r'(\| Version \| Date.*?\n\| :--.*?\n)(?P<row>\|.*\n)', re.S) m = table_rx.search(content) if not m: return None first_row = m.group('row') ver_m = re.search(r'\|\s*(\d+\.\d+\.\d+)\s*\|', first_row) return ver_m.group(1) if ver_m else NoneAlso applies to: 543-571, 579-597
624-645
: Docstring says “source- and destination-*” but code returns only sources*Should we fix the docstring (or extend the function) to avoid confusion, wdyt?
- Get all connector directories (source-* and destination-*). + Get all source connector directories (source-*).
52-94
: manifest_content typing vs behavior and search pathsThe property is annotated
Optional[str]
but always returns a string (possibly empty). Also, Python packages sometimes nest undersrc/
. Should we (a) returnNone
when absent and (b) also probeself.path/'src'/package_name/manifest.yaml
, wdyt?
266-339
: Import scanning could false-positive/negative; consider AST-based passString scanning may catch commented imports or miss multiline imports. Would you like a quick
ast
walk to find real Import/ImportFrom nodes and identifiers, wdyt?
696-703
: Report non-eligibility reasons to aid manual follow-upWe print only names of non-eligible connectors. Shall we log per-connector reasons (e.g., which checks failed) to speed up manual migrations, wdyt?
9-11
: TODO: supply declarative image digestDo you want to add a small README note and make the new
--manifest-image-digest
flag explicit in the CLI help so we don’t forget before running this at scale, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
bin/v7_migration.py
(1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Linters
bin/v7_migration.py
[error] 13-13: Ruff I001: Import block is un-sorted or un-formatted. Organize imports. Command: poetry run ruff check .
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: source-shopify
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: Analyze (python)
bin/v7_migration.py
Outdated
def _migrate_declarative_manifest_to_v7(self, sha256_hash: str) -> bool: | ||
""" | ||
Migrate a declarative manifest connector to use CDK v7 base image. | ||
|
||
Args: | ||
sha256_hash: The SHA256 hash for the v7.0.0 base image | ||
|
||
Returns: | ||
True if migration was successful, False otherwise | ||
""" | ||
try: | ||
metadata_file = self.path / "metadata.yaml" | ||
|
||
if not metadata_file.exists(): | ||
print(f"No metadata.yaml found for {self.name}") | ||
return False | ||
|
||
# Load the current metadata | ||
with open(metadata_file, 'r') as f: | ||
metadata_content = f.read() | ||
|
||
# Update the baseImage to version 7 | ||
new_base_image = f"docker.io/airbyte/source-declarative-manifest:7.0.0@sha256:{sha256_hash}" | ||
|
||
# Replace the base image using regex to preserve formatting | ||
base_image_pattern = r'(baseImage:\s*)[^\n\r]+' | ||
replacement = rf'\g<1>{new_base_image}' | ||
|
||
updated_content = re.sub(base_image_pattern, replacement, metadata_content) | ||
|
||
# Write back the updated metadata | ||
with open(metadata_file, 'w') as f: | ||
f.write(updated_content) | ||
|
||
print(f"Successfully migrated {self.name} to use CDK v7 declarative manifest") | ||
return True | ||
|
||
except Exception as e: | ||
print(f"Error migrating declarative manifest for {self.name}: {e}") | ||
return False | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Avoid regex-writing YAML; update structured field instead
Editing YAML via regex risks touching the wrong key and breaking formatting. Shall we load, update data.connectorBuildOptions.baseImage
, and dump back, wdyt?
- # Load the current metadata
- with open(metadata_file, 'r') as f:
- metadata_content = f.read()
-
- # Update the baseImage to version 7
- new_base_image = f"docker.io/airbyte/source-declarative-manifest:7.0.0@sha256:{sha256_hash}"
-
- # Replace the base image using regex to preserve formatting
- base_image_pattern = r'(baseImage:\s*)[^\n\r]+'
- replacement = rf'\g<1>{new_base_image}'
-
- updated_content = re.sub(base_image_pattern, replacement, metadata_content)
-
- # Write back the updated metadata
- with open(metadata_file, 'w') as f:
- f.write(updated_content)
+ with open(metadata_file, 'r', encoding='utf-8') as f:
+ meta = yaml.safe_load(f)
+ new_base_image = f"docker.io/airbyte/source-declarative-manifest:7.0.0@sha256:{sha256_hash}"
+ meta.setdefault('data', {}).setdefault('connectorBuildOptions', {})['baseImage'] = new_base_image
+ with open(metadata_file, 'w', encoding='utf-8') as f:
+ yaml.safe_dump(meta, f, sort_keys=False)
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In bin/v7_migration.py around lines 386 to 426, the function currently edits
metadata.yaml with a regex which can target the wrong key and break YAML;
instead load the file as YAML, update the structured field
data.connectorBuildOptions.baseImage to
"docker.io/airbyte/source-declarative-manifest:7.0.0@sha256:<sha256_hash>"
(create connectorBuildOptions and baseImage keys if missing), then dump the YAML
back to the file using a YAML library (safe_load/safe_dump or ruamel.yaml to
better preserve formatting) and handle IO/errors accordingly.
bin/v7_migration.py
Outdated
# Check poetry dependencies (dict format) | ||
poetry_deps = pyproject_data.get('tool', {}).get('poetry', {}).get('dependencies', {}) | ||
if isinstance(poetry_deps, dict) and 'airbyte-cdk' in poetry_deps: | ||
poetry_deps['airbyte-cdk'] = '^7' | ||
updated = True | ||
|
||
# Check PEP 621 dependencies (list format) | ||
project_deps = pyproject_data.get('project', {}).get('dependencies', []) | ||
if isinstance(project_deps, list): | ||
for i, dep in enumerate(project_deps): | ||
if isinstance(dep, str) and dep.startswith('airbyte-cdk'): | ||
project_deps[i] = 'airbyte-cdk>=7.0.0,<8.0.0' | ||
updated = True | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Handle Poetry dependency as a table (extras, markers)
Poetry often defines airbyte-cdk
as a table: {version="^6", extras=[...]}
. Could we update both string and table forms, wdyt?
- if isinstance(poetry_deps, dict) and 'airbyte-cdk' in poetry_deps:
- poetry_deps['airbyte-cdk'] = '^7'
- updated = True
+ if isinstance(poetry_deps, dict) and 'airbyte-cdk' in poetry_deps:
+ dep = poetry_deps['airbyte-cdk']
+ if isinstance(dep, str):
+ poetry_deps['airbyte-cdk'] = '^7'
+ updated = True
+ elif isinstance(dep, dict):
+ dep['version'] = '^7'
+ updated = True
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# Check poetry dependencies (dict format) | |
poetry_deps = pyproject_data.get('tool', {}).get('poetry', {}).get('dependencies', {}) | |
if isinstance(poetry_deps, dict) and 'airbyte-cdk' in poetry_deps: | |
poetry_deps['airbyte-cdk'] = '^7' | |
updated = True | |
# Check PEP 621 dependencies (list format) | |
project_deps = pyproject_data.get('project', {}).get('dependencies', []) | |
if isinstance(project_deps, list): | |
for i, dep in enumerate(project_deps): | |
if isinstance(dep, str) and dep.startswith('airbyte-cdk'): | |
project_deps[i] = 'airbyte-cdk>=7.0.0,<8.0.0' | |
updated = True | |
# Check poetry dependencies (dict format) | |
poetry_deps = pyproject_data.get('tool', {}).get('poetry', {}).get('dependencies', {}) | |
if isinstance(poetry_deps, dict) and 'airbyte-cdk' in poetry_deps: | |
dep = poetry_deps['airbyte-cdk'] | |
if isinstance(dep, str): | |
poetry_deps['airbyte-cdk'] = '^7' | |
updated = True | |
elif isinstance(dep, dict): | |
dep['version'] = '^7' | |
updated = True | |
# Check PEP 621 dependencies (list format) | |
project_deps = pyproject_data.get('project', {}).get('dependencies', []) | |
if isinstance(project_deps, list): | |
for i, dep in enumerate(project_deps): | |
if isinstance(dep, str) and dep.startswith('airbyte-cdk'): | |
project_deps[i] = 'airbyte-cdk>=7.0.0,<8.0.0' | |
updated = True |
🤖 Prompt for AI Agents
In bin/v7_migration.py around lines 449 to 462, the code only updates poetry
dependencies when the value is a string but Poetry often represents dependencies
as a table/dict (e.g., {version="^6", extras=[...]}); modify the logic so that
if poetry_deps['airbyte-cdk'] is a dict you update its 'version' field to '^7'
(or set 'version' = '^7' if missing) while preserving other keys like 'extras'
and markers, and still set updated = True; keep the existing behavior for string
values unchanged.
bin/v7_migration.py
Outdated
def main(): | ||
parser = argparse.ArgumentParser( | ||
description="Migrate Airbyte connectors to CDK version 7", | ||
formatter_class=argparse.RawDescriptionHelpFormatter, | ||
epilog=""" | ||
Examples: | ||
# Migrate 10 connectors (default count) | ||
python v7_migration.py --connectors-path /path/to/connectors | ||
|
||
# Migrate 5 connectors | ||
python v7_migration.py --connectors-path /path/to/connectors --count 5 | ||
|
||
# Migrate all eligible connectors (use a large number) | ||
python v7_migration.py --connectors-path /path/to/connectors --count 1000 | ||
|
||
# Use relative path | ||
python v7_migration.py --connectors-path ../connectors --count 3 | ||
""" | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
CLI help/examples mismatch the actual flag
Examples show --connectors-path, but the parser defines --airbyte-repo-path. Could we align examples to the real flag (and rename the help to “Path to the Airbyte monorepo root”), wdyt?
- epilog="""
+ epilog="""
Examples:
# Migrate 10 connectors (default count)
- python v7_migration.py --connectors-path /path/to/connectors
+ python v7_migration.py --airbyte-repo-path /path/to/airbyte
...
- python v7_migration.py --connectors-path /path/to/connectors --count 5
+ python v7_migration.py --airbyte-repo-path /path/to/airbyte --count 5
...
- python v7_migration.py --connectors-path /path/to/connectors --count 1000
+ python v7_migration.py --airbyte-repo-path /path/to/airbyte --count 1000
...
- python v7_migration.py --connectors-path ../connectors --count 3
+ python v7_migration.py --airbyte-repo-path ../airbyte --count 3
"""
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def main(): | |
parser = argparse.ArgumentParser( | |
description="Migrate Airbyte connectors to CDK version 7", | |
formatter_class=argparse.RawDescriptionHelpFormatter, | |
epilog=""" | |
Examples: | |
# Migrate 10 connectors (default count) | |
python v7_migration.py --connectors-path /path/to/connectors | |
# Migrate 5 connectors | |
python v7_migration.py --connectors-path /path/to/connectors --count 5 | |
# Migrate all eligible connectors (use a large number) | |
python v7_migration.py --connectors-path /path/to/connectors --count 1000 | |
# Use relative path | |
python v7_migration.py --connectors-path ../connectors --count 3 | |
""" | |
) | |
def main(): | |
parser = argparse.ArgumentParser( | |
description="Migrate Airbyte connectors to CDK version 7", | |
formatter_class=argparse.RawDescriptionHelpFormatter, | |
epilog=""" | |
Examples: | |
# Migrate 10 connectors (default count) | |
python v7_migration.py --airbyte-repo-path /path/to/airbyte | |
# Migrate 5 connectors | |
python v7_migration.py --airbyte-repo-path /path/to/airbyte --count 5 | |
# Migrate all eligible connectors (use a large number) | |
python v7_migration.py --airbyte-repo-path /path/to/airbyte --count 1000 | |
# Use relative path | |
python v7_migration.py --airbyte-repo-path ../airbyte --count 3 | |
""" | |
) | |
# … rest of main() … |
🤖 Prompt for AI Agents
In bin/v7_migration.py around lines 647 to 666, the CLI examples use
--connectors-path while the parser actually defines --airbyte-repo-path; update
the epilog examples to use --airbyte-repo-path (so examples match the actual
flag), and change the parser.add_argument help text for --airbyte-repo-path to
read "Path to the Airbyte monorepo root" to better describe the flag.
PyTest Results (Fast)3 742 tests - 10 3 730 ✅ - 10 6m 22s ⏱️ -7s Results for commit 6dc3ebd. ± Comparison against base commit 5378cef. This pull request removes 91 and adds 81 tests. Note that renamed tests count towards both.
♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (8)
bin/v7_migration.py (8)
13-21
: Fix Ruff I001: group and sort imports (stdlib vs third‑party).This is currently failing CI. Shall we reorder and group to satisfy Ruff/isort, wdyt?
-import argparse -import re -import subprocess -from datetime import datetime -from pathlib import Path -from typing import Dict, List, Optional - -import toml -import yaml +import argparse +import re +import subprocess +from datetime import datetime +from pathlib import Path +from typing import Dict, List, Optional + +import toml +import yaml
678-690
: Align CLI examples with the actual flag name.Examples use --connectors-path but the parser defines --airbyte-repo-path. Shall we update the epilog to match, wdyt?
- python v7_migration.py --connectors-path /path/to/connectors + python v7_migration.py --airbyte-repo-path /path/to/airbyte @@ - python v7_migration.py --connectors-path /path/to/connectors --count 5 + python v7_migration.py --airbyte-repo-path /path/to/airbyte --count 5 @@ - python v7_migration.py --connectors-path /path/to/connectors --count 1000 + python v7_migration.py --airbyte-repo-path /path/to/airbyte --count 1000 @@ - python v7_migration.py --connectors-path ../connectors --count 3 + python v7_migration.py --airbyte-repo-path ../airbyte --count 3
354-381
: Don’t default to writing an invalid image digest (“”); gate the update behind a real value.Using "" risks committing a broken baseImage. Can we make the digest optional, skip the declarative update when absent, and add a CLI flag to pass it, wdyt?
-def migrate_to_cdk_v7(self, sha256_hash: str = "<TBD>") -> bool: +def migrate_to_cdk_v7(self, sha256_hash: Optional[str] = None) -> bool: @@ - if self._uses_declarative_manifest(): - success = self._migrate_declarative_manifest_to_v7(sha256_hash) + if self._uses_declarative_manifest(): + if not sha256_hash: + print(f"Skipping declarative base image update for {self.name}: missing --manifest-image-digest") + else: + success = self._migrate_declarative_manifest_to_v7(sha256_hash) or successAdd a CLI flag and plumb it through main (see separate comment below). Want me to wire this up end-to-end?
399-437
: Avoid regex-editing YAML; update the structured field and verify the change occurred.Regex can hit the wrong key and returns success even if no replacement happens. Shall we load metadata.yaml, set data.connectorBuildOptions.baseImage, and dump back (preserving keys), and only return True when the value actually changed, wdyt?
- # Load the current metadata - with open(metadata_file, "r") as f: - metadata_content = f.read() - - # Update the baseImage to version 7 - new_base_image = ( - f"docker.io/airbyte/source-declarative-manifest:7.0.0@sha256:{sha256_hash}" - ) - - # Replace the base image using regex to preserve formatting - base_image_pattern = r"(baseImage:\s*)[^\n\r]+" - replacement = rf"\g<1>{new_base_image}" - - updated_content = re.sub(base_image_pattern, replacement, metadata_content) - - # Write back the updated metadata - with open(metadata_file, "w") as f: - f.write(updated_content) + with open(metadata_file, "r", encoding="utf-8") as f: + meta = yaml.safe_load(f) or {} + new_base_image = f"docker.io/airbyte/source-declarative-manifest:7.0.0@sha256:{sha256_hash}" + data = meta.setdefault("data", {}) + cbo = data.setdefault("connectorBuildOptions", {}) + old = cbo.get("baseImage") + cbo["baseImage"] = new_base_image + if old == new_base_image: + print(f"{self.name}: baseImage already at CDK v7") + return True + with open(metadata_file, "w", encoding="utf-8") as f: + yaml.safe_dump(meta, f, sort_keys=False)If you’d like better formatting preservation, I can switch to ruamel.yaml, wdyt?
465-477
: Handle Poetry’s dict/table dependency form for airbyte-cdk.Poetry often declares dependencies as tables (extras, markers). Can we support both string and dict forms to avoid missing upgrades, wdyt?
- poetry_deps = pyproject_data.get("tool", {}).get("poetry", {}).get("dependencies", {}) - if isinstance(poetry_deps, dict) and "airbyte-cdk" in poetry_deps: - poetry_deps["airbyte-cdk"] = "^7" - updated = True + poetry_deps = pyproject_data.get("tool", {}).get("poetry", {}).get("dependencies", {}) + if isinstance(poetry_deps, dict) and "airbyte-cdk" in poetry_deps: + dep = poetry_deps["airbyte-cdk"] + if isinstance(dep, str): + poetry_deps["airbyte-cdk"] = "^7" + updated = True + elif isinstance(dep, dict): + dep["version"] = "^7" + updated = True
693-705
: Make --airbyte-repo-path required and validate connectors directory exists.
Path(None)
will crash; also we should verify<repo>/airbyte-integrations/connectors
is present. Shall we make the arg required and add clear checks, wdyt?- parser.add_argument( - "--airbyte-repo-path", - type=str, - default=None, - help="Path to the connectors directory (required)", - ) + parser.add_argument( + "--airbyte-repo-path", + type=Path, + required=True, + help="Path to the Airbyte monorepo root (containing airbyte-integrations/)", + ) @@ - airbyte_repo_path = Path(args.airbyte_repo_path).resolve() - if not airbyte_repo_path.exists(): - print(f"Error: airbyte_repo_path path does not exist: {airbyte_repo_path}") + airbyte_repo_path: Path = args.airbyte_repo_path.resolve() + if not airbyte_repo_path.exists(): + print(f"Error: repo path does not exist: {airbyte_repo_path}") return 1 @@ - print(f"Using connectors path: {airbyte_repo_path}") + print(f"Using repo path: {airbyte_repo_path}") @@ - for connector_path in get_connector_directories( - airbyte_repo_path / "airbyte-integrations" / "connectors" - ) + for connector_path in get_connector_directories( + airbyte_repo_path / "airbyte-integrations" / "connectors" + ) ] + connectors_root = airbyte_repo_path / "airbyte-integrations" / "connectors" + if not connectors_root.exists(): + print(f"Error: connectors directory not found at: {connectors_root}") + return 1Also applies to: 709-721
700-705
: Add a CLI flag for the manifest base image digest and plumb it through.This pairs with the earlier change to avoid defaulting to "". Shall we add the flag and pass it to migrate_to_cdk_v7, wdyt?
parser.add_argument( "--count", type=int, default=10, help="Number of eligible connectors to migrate (default: 10)", ) + parser.add_argument( + "--manifest-image-digest", + type=str, + default=None, + help="sha256 for docker.io/airbyte/source-declarative-manifest:7.0.0 (required to update declarative base image)", + ) @@ - eligible_for_migration[i].migrate_to_cdk_v7() + eligible_for_migration[i].migrate_to_cdk_v7(sha256_hash=args.manifest_image_digest)Also applies to: 734-737
742-743
: Propagate the exit code from main.Right now failures return 0. Shall we exit with main()’s return value, wdyt?
-if __name__ == "__main__": - main() +if __name__ == "__main__": + raise SystemExit(main())
🧹 Nitpick comments (8)
bin/v7_migration.py (8)
471-477
: Also update PEP 621 extras and preserve extras when present.Project-level deps can include extras like "airbyte-cdk[singer]>=6,<7" and optional-dependencies. Shall we preserve extras and handle optional-dependencies, wdyt?
- project_deps = pyproject_data.get("project", {}).get("dependencies", []) + project_table = pyproject_data.get("project", {}) + project_deps = project_table.get("dependencies", []) if isinstance(project_deps, list): for i, dep in enumerate(project_deps): - if isinstance(dep, str) and dep.startswith("airbyte-cdk"): - project_deps[i] = "airbyte-cdk>=7.0.0,<8.0.0" + if isinstance(dep, str) and dep.startswith("airbyte-cdk"): + # Preserve extras if present, e.g. airbyte-cdk[extra] + m = re.match(r"^(airbyte-cdk(?:\[[^\]]+\])?).*$", dep) + if m: + project_deps[i] = f"{m.group(1)}>=7.0.0,<8.0.0" updated = True + # Optional dependencies (extras) + opt_deps = project_table.get("optional-dependencies", {}) + if isinstance(opt_deps, dict): + for extra, deps in opt_deps.items(): + if isinstance(deps, list): + for i, dep in enumerate(deps): + if isinstance(dep, str) and dep.startswith("airbyte-cdk"): + m = re.match(r"^(airbyte-cdk(?:\[[^\]]+\])?).*$", dep) + if m: + deps[i] = f"{m.group(1)}>=7.0.0,<8.0.0" + updated = True
651-654
: Docstring says “source- and destination-*” but code returns only sources; also sort for determinism.*Shall we fix the docstring and sort directories for stable runs, wdyt?
-def get_connector_directories(connectors_path: Path) -> List[Path]: +def get_connector_directories(connectors_path: Path) -> List[Path]: @@ - Get all connector directories (source-* and destination-*). + Get all source connector directories (source-*). @@ - for item in connectors_path.iterdir(): + for item in connectors_path.iterdir(): if item.is_dir() and item.name.startswith("source-"): connector_dirs.append(item) - return connector_dirs + connector_dirs.sort(key=lambda p: p.name) + return connector_dirsAlso applies to: 666-671
142-175
: Broaden detection of “unbounded v6” specs in pyproject.You only match caret (“^6”). Some connectors use “>=6,<7” or “~6.x”. Should we include these patterns to avoid missing candidates, wdyt?
- if isinstance(cdk_version, str) and "^6" in cdk_version: + if isinstance(cdk_version, str) and re.search(r"(\^6|\b>=\s*6(\.\d+)*\s*,\s*<\s*7\b|~=\s*6(\.\d+)*)", cdk_version): return True @@ - if "^6" in dep: + if re.search(r"(\^6|\b>=\s*6(\.\d+)*\s*,\s*<\s*7\b|~=\s*6(\.\d+)*)", dep): return TrueIf you prefer robust parsing, I can switch this to use packaging.specifiers, wdyt?
47-51
: Use explicit UTF‑8 encoding for file IO.For cross‑platform consistency, shall we add
encoding="utf-8"
to reads/writes, wdyt?Example changes:
-with open(metadata_file, "r") as f: +with open(metadata_file, "r", encoding="utf-8") as f: -with open(manifest_file, "r") as f: +with open(manifest_file, "r", encoding="utf-8") as f: -with open(metadata_file, "w") as f: +with open(metadata_file, "w", encoding="utf-8") as f: -with open(pyproject_file, "r") as f: +with open(pyproject_file, "r", encoding="utf-8") as f: -with open(pyproject_file, "w") as f: +with open(pyproject_file, "w", encoding="utf-8") as f: -with open(changelog_file, "r", encoding="utf-8") as f: +with open(changelog_file, "r", encoding="utf-8") as f: -with open(changelog_file, "w", encoding="utf-8") as f: +with open(changelog_file, "w", encoding="utf-8") as f:Also applies to: 85-90, 431-433, 459-461, 479-482, 563-565, 594-596
311-344
: Reduce false positives when scanning deprecated imports by using AST.String matching can miss aliased imports and multi-line forms. Should we parse Python files with ast and look for import nodes referencing the target names/modules, wdyt?
I can replace the loop with an ast-based check that safely handles aliases and multiline imports while ignoring comments. Want me to draft that?
723-733
: Variable naming nit: v6_declarative_sources includes Python connectors too.To avoid confusion, shall we rename to
v6_connectors
(or similar), wdyt?- v6_declarative_sources = [c for c in connectors if c.is_using_unbounded_cdk_version_6()] - eligible_for_migration = [c for c in v6_declarative_sources if c.is_eligible_for_migration()] - non_eligible_for_migration = [ - c for c in v6_declarative_sources if not c.is_eligible_for_migration() + v6_connectors = [c for c in connectors if c.is_using_unbounded_cdk_version_6()] + eligible_for_migration = [c for c in v6_connectors if c.is_eligible_for_migration()] + non_eligible_for_migration = [ + c for c in v6_connectors if not c.is_eligible_for_migration() ] @@ - f"{len(eligible_for_migration)} sources eligible for automatic migration out of {len(v6_declarative_sources)} sources using CDK version 6" + f"{len(eligible_for_migration)} sources eligible for automatic migration out of {len(v6_connectors)} sources using CDK version 6"
3-12
: CLI/docs polish: remove TODO and add digest flag to “How to use.”Since we’ll add
--manifest-image-digest
, shall we update this header and drop the TODO, wdyt?-# * Have a venv with yaml and toml +# * Have a venv with yaml and toml installed @@ -# * Find and replace `[TBD](https://github.com/airbytehq/airbyte/pull/TBD)` +# * Pass --manifest-image-digest for declarative sources; update PR link in changelog as needed @@ -# * Set the hash for the source-declarative-manifest docker image (hardcoded as "<TBD>" for now) +# * Provide the hash for the source-declarative-manifest docker image via --manifest-image-digest
716-721
: Make migration order deterministic and consider skipping known exceptions via a denylist.For reproducibility, shall we sort
eligible_for_migration
by name and optionally accept a--skip
list or a built-in denylist for the connectors called out in the PR description, wdyt?- connectors = [ + connectors = [ Connector(connector_path) for connector_path in get_connector_directories( airbyte_repo_path / "airbyte-integrations" / "connectors" ) ] + connectors.sort(key=lambda c: c.name) @@ - for i in range(count_to_migrate): - eligible_for_migration[i].migrate_to_cdk_v7(sha256_hash=args.manifest_image_digest) + for connector in sorted(eligible_for_migration, key=lambda c: c.name)[:count_to_migrate]: + connector.migrate_to_cdk_v7(sha256_hash=args.manifest_image_digest)If you want, I can add a
--skip
arg and a small built-in denylist for the “custom” sources mentioned, wdyt?Also applies to: 734-739
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
bin/v7_migration.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
bin/v7_migration.py (1)
unit_tests/source_declarative_manifest/conftest.py (1)
components_file
(87-103)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: source-shopify
- GitHub Check: Pytest (Fast)
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (5)
bin/v7_migration.py (5)
799-800
: Propagate exit code
main()
returns a code but the process always exits 0. Shall we exit withSystemExit(main())
, wdyt?-if __name__ == "__main__": - main() +if __name__ == "__main__": + raise SystemExit(main())
410-427
: Avoid regex-editing YAML and remove hardcoded digestEditing
metadata.yaml
via regex is brittle, and pinning a digest in code risks staleness. Shall we (1) updatedata.connectorBuildOptions.baseImage
via YAML and (2) pass the digest from a CLI flag, skipping the base image update if not provided, wdyt?- def _migrate_declarative_manifest_to_v7(self) -> bool: + def _migrate_declarative_manifest_to_v7(self, sha256_digest: Optional[str]) -> bool: @@ - # Load the current metadata - with open(metadata_file, "r") as f: - metadata_content = f.read() - - # Update the baseImage to version 7 - new_base_image = ( - f"docker.io/airbyte/source-declarative-manifest:7.0.1@sha256:ff1e701c8f913cf24a0220f62c8e64cc1c3011ba0a636985f4db47fdab1391b6" - ) - - # Replace the base image using regex to preserve formatting - base_image_pattern = r"(baseImage:\s*)[^\n\r]+" - replacement = rf"\g<1>{new_base_image}" - - updated_content = re.sub(base_image_pattern, replacement, metadata_content) - - # Write back the updated metadata with the new base image - with open(metadata_file, "w") as f: - f.write(updated_content) + with open(metadata_file, "r", encoding="utf-8") as f: + meta = yaml.safe_load(f) or {} + if not sha256_digest: + print(f"Skipping baseImage update for {self.name}: missing --manifest-image-digest") + else: + new_base_image = f"docker.io/airbyte/source-declarative-manifest:7.0.1@sha256:{sha256_digest}" + meta.setdefault("data", {}).setdefault("connectorBuildOptions", {})["baseImage"] = new_base_image + with open(metadata_file, "w", encoding="utf-8") as f: + yaml.safe_dump(meta, f, sort_keys=False)
465-477
: Preserve Poetry tables and PEP 621 extras/markers when bumping to v7For Poetry,
airbyte-cdk
can be a table; for PEP 621, bumping by string replace drops extras/markers. Shall we keep extras/markers intact usingpackaging.Requirement
, wdyt?- poetry_deps = pyproject_data.get("tool", {}).get("poetry", {}).get("dependencies", {}) - if isinstance(poetry_deps, dict) and "airbyte-cdk" in poetry_deps: - poetry_deps["airbyte-cdk"] = "^7" - updated = True + poetry_deps = pyproject_data.get("tool", {}).get("poetry", {}).get("dependencies", {}) + if isinstance(poetry_deps, dict) and "airbyte-cdk" in poetry_deps: + dep = poetry_deps["airbyte-cdk"] + if isinstance(dep, str): + poetry_deps["airbyte-cdk"] = "^7" + elif isinstance(dep, dict): + dep["version"] = "^7" + updated = True @@ - if isinstance(project_deps, list): - for i, dep in enumerate(project_deps): - if isinstance(dep, str) and dep.startswith("airbyte-cdk"): - project_deps[i] = "airbyte-cdk>=7.0.0,<8.0.0" - updated = True + if isinstance(project_deps, list): + for i, dep in enumerate(project_deps): + if isinstance(dep, str): + try: + req = Requirement(dep) + except Exception: + continue + if req.name == "airbyte-cdk": + req.specifier = SpecifierSet(">=7.0.0,<8.0.0") + project_deps[i] = str(req) + updated = TrueAlso applies to: 471-477
731-747
: CLI polish: examples mismatch flag, arg should be required, and validate connectors dirExamples use
--connectors-path
but the parser defines--airbyte-repo-path
. Also--airbyte-repo-path
is optional yet immediately used, and we don’t verify<repo>/airbyte-integrations/connectors
exists. Shall we align examples, make the arg required, validate the connectors dir, and add a--manifest-image-digest
flag, wdyt?parser = argparse.ArgumentParser( description="Migrate Airbyte connectors to CDK version 7", formatter_class=argparse.RawDescriptionHelpFormatter, epilog=""" Examples: # Migrate 10 connectors (default count) - python v7_migration.py --connectors-path /path/to/connectors + python v7_migration.py --airbyte-repo-path /path/to/airbyte @@ # Migrate 5 connectors - python v7_migration.py --connectors-path /path/to/connectors --count 5 + python v7_migration.py --airbyte-repo-path /path/to/airbyte --count 5 @@ # Migrate all eligible connectors (use a large number) - python v7_migration.py --connectors-path /path/to/connectors --count 1000 + python v7_migration.py --airbyte-repo-path /path/to/airbyte --count 1000 @@ # Use relative path - python v7_migration.py --connectors-path ../connectors --count 3 + python v7_migration.py --airbyte-repo-path ../airbyte --count 3 """, ) @@ - parser.add_argument( - "--airbyte-repo-path", - type=str, - default=None, - help="Path to the connectors directory (required)", - ) + parser.add_argument( + "--airbyte-repo-path", + type=Path, + required=True, + help="Path to the Airbyte monorepo root (containing airbyte-integrations/)", + ) + parser.add_argument( + "--manifest-image-digest", + type=str, + default=None, + help="sha256 digest for docker.io/airbyte/source-declarative-manifest:7.0.1 (optional; skips baseImage update if omitted)", + ) @@ - airbyte_repo_path = Path(args.airbyte_repo_path).resolve() - if not airbyte_repo_path.exists(): - print(f"Error: airbyte_repo_path path does not exist: {airbyte_repo_path}") + airbyte_repo_path: Path = args.airbyte_repo_path.resolve() + if not airbyte_repo_path.exists(): + print(f"Error: repo path does not exist: {airbyte_repo_path}") return 1 @@ - print(f"Using connectors path: {airbyte_repo_path}") + print(f"Using repo path: {airbyte_repo_path}") @@ - connectors = [ - Connector(connector_path) - for connector_path in get_connector_directories( - airbyte_repo_path / "airbyte-integrations" / "connectors" - ) - ] + connectors_root = airbyte_repo_path / "airbyte-integrations" / "connectors" + if not connectors_root.exists(): + print(f"Error: connectors directory not found at: {connectors_root}") + return 1 + connectors = [Connector(p) for p in get_connector_directories(connectors_root)] @@ - eligible_for_migration[i].migrate_to_cdk_v7() + eligible_for_migration[i].migrate_to_cdk_v7(sha256_hash=args.manifest_image_digest)Also applies to: 751-756, 766-769, 773-778, 791-795
352-376
: Thread the digest from CLI and call the updated helperIf we add a digest flag, we’ll need to thread it through. Shall we extend the signature and call site, wdyt?
- def migrate_to_cdk_v7(self) -> bool: + def migrate_to_cdk_v7(self, sha256_hash: Optional[str] = None) -> bool: @@ - if self._uses_declarative_manifest(): - success = self._migrate_declarative_manifest_to_v7() + if self._uses_declarative_manifest(): + success = self._migrate_declarative_manifest_to_v7(sha256_hash)
🧹 Nitpick comments (5)
bin/v7_migration.py (5)
589-656
: Keep the CDK version label consistent in changelog vs base imageChangelog says “v7.0.0” but base image uses “7.0.1”. Would you prefer “Update to CDK v7” to avoid drift, or centralize the version in a constant used in both places, wdyt?
- new_entry = f"| {next_version} | {current_date} | [TBD](https://github.com/airbytehq/airbyte/pull/TBD) | Update to CDK v7.0.0 |" + new_entry = f"| {next_version} | {current_date} | [TBD](https://github.com/airbytehq/airbyte/pull/TBD) | Update to CDK v7 |"Also applies to: 630-632, 416-417
50-92
: Return None instead of empty string for missing manifest (aligns with Optional[str])The property advertises Optional[str] but caches empty string on miss. Shall we return None to match the type and simplify truthiness checks, wdyt?
- # No manifest found in any location - self._manifest_content = "" - return self._manifest_content + # No manifest found in any location + self._manifest_content = None + return self._manifest_content
44-46
: Specify UTF-8 encoding for file I/OSeveral reads/writes omit
encoding="utf-8"
. Adding it avoids locale-dependent bugs. Shall we standardize on UTF-8 across the script, wdyt?- with open(metadata_file, "r") as f: + with open(metadata_file, "r", encoding="utf-8") as f: return yaml.safe_load(f) @@ - with open(metadata_file, "r") as f: + with open(metadata_file, "r", encoding="utf-8") as f: metadata_content = f.read() @@ - with open(pyproject_file, "r") as f: + with open(pyproject_file, "r", encoding="utf-8") as f: pyproject_data = toml.load(f) @@ - with open(pyproject_file, "w") as f: + with open(pyproject_file, "w", encoding="utf-8") as f: # WARNING: using toml to rewrite stuff can lead to changing the order/the formatting on some things toml.dump(pyproject_data, f) @@ - with open(metadata_file, "r") as f: + with open(metadata_file, "r", encoding="utf-8") as f: metadata_content = f.read() @@ - with open(metadata_file, "w") as f: + with open(metadata_file, "w", encoding="utf-8") as f: f.write(updated_content) @@ - with open(changelog_file, "w", encoding="utf-8") as f: + with open(changelog_file, "w", encoding="utf-8") as f: f.write(updated_content)Also applies to: 411-427, 460-483, 522-541, 617-653
709-716
: Docstring says “source- and destination-*” but code filters only sources*Either update the docstring to “source-* only” or include destinations in the filter. Which do you prefer, wdyt?
- Get all connector directories (source-* and destination-*). + Get all source connector directories (source-*).
771-771
: Log wording nit: repo vs connectors pathThis prints the repo root, not a connectors path. Shall we reword to “Using repo path”, wdyt?
- print(f"Using connectors path: {airbyte_repo_path}") + print(f"Using repo path: {airbyte_repo_path}")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
bin/v7_migration.py
(1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Linters
bin/v7_migration.py
[error] 357-363: Ruff format check failed. 1 file would be reformatted by 'ruff format --diff'. Run 'ruff format' to apply formatting changes. Command: poetry run ruff format --diff .
[error] 412-420: Ruff format check failed. 1 file would be reformatted by 'ruff format --diff'. Run 'ruff format' to apply formatting changes. Command: poetry run ruff format --diff .
[error] 525-545: Ruff format check failed. 1 file would be reformatted by 'ruff format --diff'. Run 'ruff format' to apply formatting changes. Command: poetry run ruff format --diff .
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Check: source-shopify
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: source-intercom
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
🔇 Additional comments (1)
bin/v7_migration.py (1)
357-363
: Run ruff format to apply formatting fixes
CI is currently flagging formatting errors in the specified ranges—can you runruff format .
(or targetbin/v7_migration.py
) and commit the resulting diff to unblock CI? wdyt?
pyproject_file = self.path / "pyproject.toml" | ||
if pyproject_file.exists(): | ||
try: | ||
with open(pyproject_file, "r") as f: | ||
pyproject_data = toml.load(f) | ||
|
||
# Check dependencies in different sections | ||
dependencies_sections = [ | ||
pyproject_data.get("tool", {}).get("poetry", {}).get("dependencies", {}), | ||
pyproject_data.get("project", {}).get("dependencies", []), | ||
] | ||
|
||
# Check poetry dependencies (dict format) | ||
poetry_deps = dependencies_sections[0] | ||
if isinstance(poetry_deps, dict) and "airbyte-cdk" in poetry_deps: | ||
cdk_version = poetry_deps["airbyte-cdk"] | ||
# Check if the version specification contains a caret sign | ||
if isinstance(cdk_version, str) and "^6" in cdk_version: | ||
return True | ||
|
||
# Check PEP 621 dependencies (list format) | ||
project_deps = dependencies_sections[1] | ||
if isinstance(project_deps, list): | ||
for dep in project_deps: | ||
if isinstance(dep, str) and dep.startswith("airbyte-cdk"): | ||
# Check if the dependency specification contains a caret sign | ||
if "^6" in dep: | ||
return True | ||
|
||
except Exception as e: | ||
print(f"Error reading pyproject.toml for {self.name}: {e}") | ||
|
||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Broaden v6 detection for PEP 621 deps and Poetry table deps
The PEP 621 path only checks for the literal “^6”, which won’t occur (PEP 621 uses PEP 440 specifiers). Also Poetry often defines airbyte-cdk
as a table. Could we (a) handle Poetry dicts and (b) parse PEP 621 strings with packaging
to detect “>=6,<7” ranges, wdyt?
+from packaging.requirements import Requirement
+from packaging.specifiers import SpecifierSet
@@
- poetry_deps = dependencies_sections[0]
- if isinstance(poetry_deps, dict) and "airbyte-cdk" in poetry_deps:
- cdk_version = poetry_deps["airbyte-cdk"]
- # Check if the version specification contains a caret sign
- if isinstance(cdk_version, str) and "^6" in cdk_version:
- return True
+ poetry_deps = dependencies_sections[0]
+ if isinstance(poetry_deps, dict) and "airbyte-cdk" in poetry_deps:
+ dep = poetry_deps["airbyte-cdk"]
+ if isinstance(dep, str) and dep.strip().startswith("^6"):
+ return True
+ if isinstance(dep, dict):
+ ver = str(dep.get("version", "")).strip()
+ if ver.startswith("^6"):
+ return True
@@
- project_deps = dependencies_sections[1]
- if isinstance(project_deps, list):
- for dep in project_deps:
- if isinstance(dep, str) and dep.startswith("airbyte-cdk"):
- # Check if the dependency specification contains a caret sign
- if "^6" in dep:
- return True
+ project_deps = dependencies_sections[1]
+ if isinstance(project_deps, list):
+ for dep in project_deps:
+ if isinstance(dep, str):
+ try:
+ req = Requirement(dep)
+ except Exception:
+ continue
+ if req.name == "airbyte-cdk":
+ # Treat as v6 if spec includes 6.x but excludes 7.x
+ spec: SpecifierSet = req.specifier
+ if spec.contains("6.0.0", prereleases=True) and not spec.contains("7.0.0", prereleases=True):
+ return True
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
pyproject_file = self.path / "pyproject.toml" | |
if pyproject_file.exists(): | |
try: | |
with open(pyproject_file, "r") as f: | |
pyproject_data = toml.load(f) | |
# Check dependencies in different sections | |
dependencies_sections = [ | |
pyproject_data.get("tool", {}).get("poetry", {}).get("dependencies", {}), | |
pyproject_data.get("project", {}).get("dependencies", []), | |
] | |
# Check poetry dependencies (dict format) | |
poetry_deps = dependencies_sections[0] | |
if isinstance(poetry_deps, dict) and "airbyte-cdk" in poetry_deps: | |
cdk_version = poetry_deps["airbyte-cdk"] | |
# Check if the version specification contains a caret sign | |
if isinstance(cdk_version, str) and "^6" in cdk_version: | |
return True | |
# Check PEP 621 dependencies (list format) | |
project_deps = dependencies_sections[1] | |
if isinstance(project_deps, list): | |
for dep in project_deps: | |
if isinstance(dep, str) and dep.startswith("airbyte-cdk"): | |
# Check if the dependency specification contains a caret sign | |
if "^6" in dep: | |
return True | |
except Exception as e: | |
print(f"Error reading pyproject.toml for {self.name}: {e}") | |
return False | |
# At the top of bin/v7_migration.py, add these imports: | |
from packaging.requirements import Requirement | |
from packaging.specifiers import SpecifierSet | |
pyproject_file = self.path / "pyproject.toml" | |
if pyproject_file.exists(): | |
try: | |
with open(pyproject_file, "r") as f: | |
pyproject_data = toml.load(f) | |
# Check dependencies in different sections | |
dependencies_sections = [ | |
pyproject_data.get("tool", {}).get("poetry", {}).get("dependencies", {}), | |
pyproject_data.get("project", {}).get("dependencies", []), | |
] | |
# Check poetry dependencies (dict format) | |
poetry_deps = dependencies_sections[0] | |
if isinstance(poetry_deps, dict) and "airbyte-cdk" in poetry_deps: | |
dep = poetry_deps["airbyte-cdk"] | |
if isinstance(dep, str) and dep.strip().startswith("^6"): | |
return True | |
if isinstance(dep, dict): | |
ver = str(dep.get("version", "")).strip() | |
if ver.startswith("^6"): | |
return True | |
# Check PEP 621 dependencies (list format) | |
project_deps = dependencies_sections[1] | |
if isinstance(project_deps, list): | |
for dep in project_deps: | |
if isinstance(dep, str): | |
try: | |
req = Requirement(dep) | |
except Exception: | |
continue | |
if req.name == "airbyte-cdk": | |
# Treat as v6 if spec includes 6.x but excludes 7.x | |
spec: SpecifierSet = req.specifier | |
if spec.contains("6.0.0", prereleases=True) and not spec.contains("7.0.0", prereleases=True): | |
return True | |
except Exception as e: | |
print(f"Error reading pyproject.toml for {self.name}: {e}") | |
return False |
🤖 Prompt for AI Agents
In bin/v7_migration.py around lines 140-172, the current logic only detects a
literal "^6" in Poetry string deps and PEP 621 list strings, missing Poetry
table entries and PEP 621 table or PEP 440 specifiers like ">=6,<7"; update
detection to: treat Poetry dependencies values that are dicts/tables (e.g., {
"version": ">=6,<7" } or nested tables) by extracting their version field and
also handle PEP 621 entries that can be either strings or tables (dicts) by
normalizing to a version specifier string; then use packaging
(packaging.requirements.Requirement or packaging.specifiers.SpecifierSet) to
parse the requirement/specifier and return True when the specifier allows any
6.x version (e.g., spec intersects the range >=6,<7) or when a caret-style
string beginning with ^6 is present—ensure exceptions are caught and logged as
before.
def is_eligible_for_migration(self) -> bool: | ||
return ( | ||
not self.has_custom_incremental_sync() | ||
and not self.has_custom_retriever() | ||
and not self.has_custom_partition_router() | ||
and not self.imports_deprecated_class() | ||
and not self.has_pyproject_toml() | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Python connectors are excluded from eligibility by mistake
is_eligible_for_migration()
returns False for any connector with a pyproject.toml
, so Python sources detected in is_using_unbounded_cdk_version_6()
can never be migrated. Shall we allow Python connectors that don’t use custom components or deprecated classes, wdyt?
def is_eligible_for_migration(self) -> bool:
- return (
- not self.has_custom_incremental_sync()
- and not self.has_custom_retriever()
- and not self.has_custom_partition_router()
- and not self.imports_deprecated_class()
- and not self.has_pyproject_toml()
- )
+ return (
+ not self.has_custom_incremental_sync()
+ and not self.has_custom_retriever()
+ and not self.has_custom_partition_router()
+ and not self.imports_deprecated_class()
+ )
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def is_eligible_for_migration(self) -> bool: | |
return ( | |
not self.has_custom_incremental_sync() | |
and not self.has_custom_retriever() | |
and not self.has_custom_partition_router() | |
and not self.imports_deprecated_class() | |
and not self.has_pyproject_toml() | |
) | |
def is_eligible_for_migration(self) -> bool: | |
return ( | |
not self.has_custom_incremental_sync() | |
and not self.has_custom_retriever() | |
and not self.has_custom_partition_router() | |
and not self.imports_deprecated_class() | |
) |
🤖 Prompt for AI Agents
In bin/v7_migration.py around lines 343 to 351, the current eligibility check
incorrectly excludes any connector with a pyproject.toml (preventing Python
connectors from being migrated); remove the "and not self.has_pyproject_toml()"
clause from the returned boolean expression so Python-based connectors that do
not use custom incremental syncs, custom retrievers, custom partition routers,
or deprecated classes remain eligible for migration.
/autofix
|
The script will also not migrate the following even though they should be fine:
Summary by CodeRabbit
New Features
Chores
Style