Skip to content

Is it possible to pass in a custom transport? #801

@tleyden

Description

@tleyden

I'm trying to add response caching via hishel transports, but am not seeing a way to customize the transport used by the Crawlee client as it is created internally in _get_client():

    def _get_client(self, proxy_url: str | None) -> httpx.AsyncClient:
        """Helper to get a HTTP client for the given proxy URL.

        If the client for the given proxy URL doesn't exist, it will be created and stored.
        """
        if proxy_url not in self._client_by_proxy_url:
            # Prepare a default kwargs for the new client.
            kwargs: dict[str, Any] = {
                'transport': _HttpxTransport(
                    proxy=proxy_url,
                    http1=self._http1,
                    http2=self._http2,
                ),
                'proxy': proxy_url,
                'http1': self._http1,
                'http2': self._http2,
            }

            # Update the default kwargs with any additional user-provided kwargs.
            kwargs.update(self._async_client_kwargs)

            client = httpx.AsyncClient(**kwargs)
            self._client_by_proxy_url[proxy_url] = client

        return self._client_by_proxy_url[proxy_url]

Is there a way to customize the httpx client transport that I'm not seeing?

Or instead of using a 3rd party library, does Crawlee have a native method for storing long term persistent caches of responses?

Somewhat related question, if its not possible to customize the transport. Is overriding HttpxHttpClient._get_client() the recommended way to use a custom httpx client, or is there a cleaner way?

    hishel_client = await _create_hishel_client(cache_path)
    class HishelCacheClient(HttpxHttpClient):
        def _get_client(self, proxy_url: str | None) -> httpx.AsyncClient:
            return hishel_client
    http_client = HishelCacheClient()

    crawler = BeautifulSoupCrawler(
        http_client=http_client,
    )

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request.t-toolingIssues with this label are in the ownership of the tooling team.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions