Skip to content

Add ignore_http_error_status_codes and additional_http_error_status_codes arguments to PlaywrightCrawler #953

@Pijukatel

Description

@Pijukatel

Currently arguments that allow to change how different return codes are handled are available only to static http-based crawlers. Those arguments can be used in crawler __init__, but are not available in PlaywrightCrawler. If someone wants to for example ignore 403 error:

crawler = ParselCrawler(..., ignore_http_error_status_codes = {403})

but in PlaywrightCrawler they have to do something like this:

crawler = PlaywrightCrawler(...)
crawler._http_client._ignore_http_error_status_codes = {403}

That is very confusing and users will hardly even know about it. The PlaywrightCrawler behavior should be aligned with other crawlers and these should be possible to set in __init__

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request.t-toolingIssues with this label are in the ownership of the tooling team.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions