-
Notifications
You must be signed in to change notification settings - Fork 483
fix: add support non-GET requests for PlaywrightCrawler
#1208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds support for issuing non-GET HTTP requests (including custom methods, headers, and payloads) in the PlaywrightCrawler
by leveraging Playwright’s routing API, and updates the existing unit test to cover GET, POST, and POST-with-payload scenarios.
- Introduces a private
_prepare_request_handler
to configure method, headers, and payload viaroute.continue_
. - Emits a warning for performance implications when using non-GET methods.
- Refactors
test_basic_request
to parametrize HTTP methods and payloads.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
tests/unit/crawlers/_playwright/test_playwright_crawler.py | Refactored test_basic_request into a parametrized test covering GET, POST, and POST-with-payload. |
src/crawlee/crawlers/_playwright/_playwright_crawler.py | Added _prepare_request_handler , warning logic in _navigate , and route registration for custom requests. |
Comments suppressed due to low confidence (3)
tests/unit/crawlers/_playwright/test_playwright_crawler.py:51
- Consider adding a test case that passes custom headers (e.g.
headers={'X-Test': 'value'}
) to verify non-GET requests include expected HTTP headers.
async def test_basic_request(method: HttpMethod, path: str, payload: HttpPayload, server_url: URL) -> None:
src/crawlee/crawlers/_playwright/_playwright_crawler.py:230
- Passing raw bytes as
post_data
may cause a type error in Playwright; consider decoding payload tostr
(e.g.payload.decode()
) or otherwise converting it to the format expected byroute.continue_
.
await route.continue_(method=method, headers=dict(headers) if headers else None, post_data=payload)
src/crawlee/crawlers/_playwright/_playwright_crawler.py:230
- Using
dict(headers)
may not correctly serialize theHttpHeaders
model; consider callingheaders.model_dump()
or.dict()
to produce a properdict[str, str]
.
await route.continue_(method=method, headers=dict(headers) if headers else None, post_data=payload)
PlaywrightCrawler
PlaywrightCrawler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks! I onlyf found some naming and documentation issues.
await hook(pre_navigation_context) | ||
yield pre_navigation_context | ||
|
||
def _prepare_request_handler( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
request_handler
is already taken, let's come up with a better name. How does _prepare_non_get_request_interceptor
sound? Not great, huh... but it's a start, I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps just _prepare_request_interceptor
? The method is not public, maybe we can name it shorter.
headers: HttpHeaders | dict[str, str] | None = None, | ||
payload: HttpPayload | None = None, | ||
) -> Callable: | ||
"""Create a route handler for Playwright to support non-GET methods with custom parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd mention that the purpose of the interception is to add custom headers and payload to the request.
Description
PlaywrightCrawler
usingRoute
Issues