Skip to content

Conversation

Mantisus
Copy link
Collaborator

@Mantisus Mantisus commented Aug 4, 2025

Description

  • Persist DefaultRenderingTypePredictor state

Issues

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds persistence capabilities to the DefaultRenderingTypePredictor by implementing state management that saves and restores the trained model and associated data to/from a key-value store. This allows the predictor to maintain its learned patterns across different runs.

  • Adds persistence support with configurable key-value storage integration
  • Implements async context manager pattern for proper resource management
  • Introduces state serialization/deserialization for scikit-learn models

Reviewed Changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/crawlee/crawlers/_adaptive_playwright/_rendering_type_predictor.py Core implementation of persistence with RecoverableState integration and async context manager
src/crawlee/crawlers/_adaptive_playwright/_utils.py Utility functions for scikit-learn model serialization and validation
src/crawlee/crawlers/_adaptive_playwright/_adaptive_playwright_crawler.py Integration of predictor into crawler's context managers
tests/unit/crawlers/_adaptive_playwright/test_predictor.py Updated tests to use async context manager and added persistence tests
tests/unit/crawlers/_adaptive_playwright/test_adaptive_playwright_crawler.py Added super().init() call to test mock class
docs/guides/code_examples/playwright_crawler_adaptive/init_prediction.py Updated example to properly call parent constructor
Comments suppressed due to low confidence (1)

tests/unit/crawlers/_adaptive_playwright/test_predictor.py:27

  • The function name 'ictor_same_label' appears to be truncated or misspelled. It should likely be 'test_predictor_same_label' or similar.
async def ictor_same_label(url: str, expected_prediction: RenderingType, label: str | None) -> None:

@Mantisus Mantisus self-assigned this Aug 4, 2025
Copy link
Collaborator

@Pijukatel Pijukatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two tiny comments

@Pijukatel Pijukatel merged commit fad4c25 into apify:master Aug 12, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Persist the DefaultRenderingTypePredictor state
2 participants