-
Notifications
You must be signed in to change notification settings - Fork 486
Closed
Labels
enhancementNew feature or request.New feature or request.t-toolingIssues with this label are in the ownership of the tooling team.Issues with this label are in the ownership of the tooling team.
Description
I'm trying to add response caching via hishel transports, but am not seeing a way to customize the transport used by the Crawlee client as it is created internally in _get_client():
def _get_client(self, proxy_url: str | None) -> httpx.AsyncClient:
"""Helper to get a HTTP client for the given proxy URL.
If the client for the given proxy URL doesn't exist, it will be created and stored.
"""
if proxy_url not in self._client_by_proxy_url:
# Prepare a default kwargs for the new client.
kwargs: dict[str, Any] = {
'transport': _HttpxTransport(
proxy=proxy_url,
http1=self._http1,
http2=self._http2,
),
'proxy': proxy_url,
'http1': self._http1,
'http2': self._http2,
}
# Update the default kwargs with any additional user-provided kwargs.
kwargs.update(self._async_client_kwargs)
client = httpx.AsyncClient(**kwargs)
self._client_by_proxy_url[proxy_url] = client
return self._client_by_proxy_url[proxy_url]
Is there a way to customize the httpx client transport that I'm not seeing?
Or instead of using a 3rd party library, does Crawlee have a native method for storing long term persistent caches of responses?
Somewhat related question, if its not possible to customize the transport. Is overriding HttpxHttpClient._get_client()
the recommended way to use a custom httpx client, or is there a cleaner way?
hishel_client = await _create_hishel_client(cache_path)
class HishelCacheClient(HttpxHttpClient):
def _get_client(self, proxy_url: str | None) -> httpx.AsyncClient:
return hishel_client
http_client = HishelCacheClient()
crawler = BeautifulSoupCrawler(
http_client=http_client,
)
Metadata
Metadata
Assignees
Labels
enhancementNew feature or request.New feature or request.t-toolingIssues with this label are in the ownership of the tooling team.Issues with this label are in the ownership of the tooling team.