-
Notifications
You must be signed in to change notification settings - Fork 454
feat: Handle unprocessed requests in add_requests_batched
#1159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds retry logic for unprocessed requests in the add_requests_batched function and introduces a unit test to validate the behavior.
- Implements automatic retry for unprocessed requests with a maximum of 5 attempts.
- Adds a new test (test_add_batched_requests_with_retry) to simulate unprocessed requests and verify that the total count is correctly updated.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
tests/unit/storages/test_request_queue.py | Added test to verify retry behavior and request count adjustments |
src/crawlee/storages/_request_queue.py | Updated add_requests_batched to use a new _process_batch method with retry logic |
Comments suppressed due to low confidence (1)
tests/unit/storages/test_request_queue.py:309
- The call to service_locator.get_storage_client() is not assigned to any variable or used in any manner. Consider removing it if it is unnecessary.
service_locator.get_storage_client()
|
||
async def _process_batch(self, batch: Sequence[Request], base_retry_wait: timedelta, attempt: int = 1) -> None: | ||
max_attempts = 5 | ||
response = await self._resource_client.batch_add_requests(batch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you're at it, could you please make sure that ApifyStorageClient
in the SDK does not do any retries of its own?
Of course, this is assuming we find that undesirable, but I firmly believe that the retry logic should be concentrated in a single place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think it does. The retries that happens are currently in apify-client-python. First in Http clients and then in RequestQueueClients. Those retries are on different level and are both dealing with status codes of http responses and are not doing retries based on parsed content of successful API call - which is the case in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So ApifyStorageClient
does its own retries via apify-client-python
. Because the apify-client implementation also does its own kind of batching, the situation gets extra blurry. It doesn't make sense to change that in this PR, but I think we should make an issue to resolve this some day. If you agree, that is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree: apify/apify-client-python#390
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
### Description Adds retry to unprocessed requests in call `add_requests_batched`. Retry calls recursively `_process_batch`, which initially works on full request batch and then on batches of unprocessed requests until retry limit is reached or all requests are processed. Each retry is done after linearly increasing delay with each attempt. Unprocessed requests are not counted in `request_queue.get_total_count` Add test. ### Issues - Closes: [Handle unprocessed requests in batch_add_requests](apify/apify-sdk-python#456)
### Description Adds retry to unprocessed requests in call `add_requests_batched`. Retry calls recursively `_process_batch`, which initially works on full request batch and then on batches of unprocessed requests until retry limit is reached or all requests are processed. Each retry is done after linearly increasing delay with each attempt. Unprocessed requests are not counted in `request_queue.get_total_count` Add test. ### Issues - Closes: [Handle unprocessed requests in batch_add_requests](apify/apify-sdk-python#456)
Description
Adds retry to unprocessed requests in call
add_requests_batched
.Retry calls recursively
_process_batch
, which initially works on full request batch and then on batches of unprocessed requests until retry limit is reached or all requests are processed. Each retry is done after linearly increasing delay with each attempt.Unprocessed requests are not counted in
request_queue.get_total_count
Add test.
Issues