Skip to content

Conversation

Mantisus
Copy link
Collaborator

@Mantisus Mantisus commented Feb 15, 2025

Description

  • Changes the PlaywrightCrawler from using the standard browser context to using a persistent browser context.
  • Allows passing a user_data_dir with the path to the directory for the context. If user_data_dir is not provided, a temporary directory will be created.

Issues

@Mantisus Mantisus requested review from vdusek and Pijukatel and removed request for vdusek February 15, 2025 03:07
@Mantisus Mantisus self-assigned this Feb 15, 2025
@Mantisus Mantisus changed the title refactor!: сhange default 'incognito context' to 'persistent context' for Playwright refactor!: сhange default incognito context to persistent context for Playwright Feb 15, 2025
@vdusek vdusek added this to the 108th sprint - Tooling team milestone Feb 19, 2025
Copy link
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it contains a breaking change, could you please describe the breaking changes in the PR's description and also summarize it in the Upgrading guide?

Mantisus and others added 16 commits February 19, 2025 13:26
### Description

- fix public imports in `__init__` files
- Add `rich` to direct dependencies. It is one of `cookiecutter`'s
dependencies, but we use it directly in `statistics._models.py`

---------

Co-authored-by: Vlada Dusek <[email protected]>
### Description

Add adaptive context helpers and documentation for
AdaptivePlaywrightCrawler.

### Issues

- Closes: apify#249

---------

Co-authored-by: Jan Buchar <[email protected]>
Co-authored-by: Jan Buchar <[email protected]>
…#988)

### Description

- change custom `LRUCache` to `cachetools.LRUCache`. In my opinion,
`functools.lru_cache's` logic isn't well-suited for this use case.
Therefore, if we want to modify our caching approach, using `cachetools`
appears to be a better option.

### Issues

- Closes: apify#86
### Description

- update curl-cffi version requirement to >=0.9.0.
- update default `impersonate` from `chrome124` to `chrome131`
- Migrate from `poetry` to `uv`.
- Relates: apify#628
- The update of templates to use `uv` will be implemented separately.
- `project.urls`
- python 3.13 in ci
- unify name "Set up uv package manager"
- fix contributing guide
- add all extra, remove dev extra (move to dev deps)
- relates: apify#628
Mantisus and others added 6 commits February 19, 2025 17:14
…pify#959)

Add `additional_http_error_status_codes` and
`ignore_http_error_status_codes` to PlaywrightCrawler.
Since they exist now on all crawlers, move them to `BasicCrawler` level.
Do not use `_http_client` attributes for getting additional status codes
related variables.

**Breaking:** Remove `HttpCrawlerOptions` -> No unique options compared
to `BasicCrawlerOptions` anymore.

- Closes: apify#953
@Mantisus Mantisus requested review from Pijukatel and vdusek February 19, 2025 17:22
Copy link
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are failing tests for Windows Python 3.12.

@Mantisus Mantisus force-pushed the pw-persist-browser-context branch from 6b6236d to 5e97e31 Compare February 20, 2025 23:25
@Mantisus
Copy link
Collaborator Author

There are failing tests for Windows Python 3.12.

The problem is not related to this PR.

It's solved in the PR #1007

Copy link
Collaborator

@Pijukatel Pijukatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tests, just two minor comments.

@Mantisus Mantisus requested review from vdusek and Pijukatel February 21, 2025 13:41
Copy link
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few comments.

@Mantisus Mantisus requested a review from vdusek February 25, 2025 02:45
Copy link
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vdusek vdusek merged commit f01520d into apify:master Feb 25, 2025
23 checks passed
@vdusek vdusek added the t-tooling Issues with this label are in the ownership of the tooling team. label Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for launching Playwright with a persistent browser context Implement option for persistent context to PlaywrightCrawler
3 participants