Skip to content

Conversation

Pijukatel
Copy link
Collaborator

Description

The argument max_request_retries of BasicCrawler previously included also the initial request in retries. Now it counts only the retries.

Issues

@Pijukatel Pijukatel added bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team. labels Jul 29, 2025
@github-actions github-actions bot added this to the 120th sprint - Tooling team milestone Jul 29, 2025
@github-actions github-actions bot added the tested Temporary label used only programatically for some analytics. label Jul 29, 2025
@Pijukatel Pijukatel requested a review from vdusek July 29, 2025 07:04
@Pijukatel Pijukatel marked this pull request as ready for review July 29, 2025 07:04
'https://c.placeholder.com',
'https://b.placeholder.com',
'https://b.placeholder.com',
'https://b.placeholder.com',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intenational?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, default retries are 3. So now with this PR it will make 4 calls (original request + 3 retries)

Comment on lines -191 to -192
'https://c.placeholder.com',
'https://c.placeholder.com',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intenational?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, 1 retry means 1 call + 1 retry.

Comment on lines -216 to -226
# Retrieve or initialize the headers, and extract the current custom retry count.
headers = context.request.headers or HttpHeaders()
custom_retry_count = int(headers.get('custom_retry_count', '0'))

# Append the current call information.
calls.append(Call(context.request.url, error, custom_retry_count))

# Update the request to include an incremented custom retry count in the headers and return it.
request = context.request.model_dump()
request['headers'] = HttpHeaders({'custom_retry_count': str(custom_retry_count + 1)})
return Request.model_validate(request)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just some optimization?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this was somewhat complicated for the test's intention, and the same could be achieved with a simpler setup.

@Pijukatel Pijukatel requested a review from vdusek July 29, 2025 08:34
@janbuchar
Copy link
Collaborator

Just curious, is there a test that checks this in the JS version?

Copy link
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Pijukatel
Copy link
Collaborator Author

Just curious, is there a test that checks this in the JS version?

I think this should cover it
https://github.com/apify/crawlee/blob/master/test/core/crawlers/basic_crawler.test.ts#L414

@Pijukatel Pijukatel merged commit 74fa1d9 into master Jul 29, 2025
19 checks passed
@Pijukatel Pijukatel deleted the fix-retry-count branch July 29, 2025 12:42
Pijukatel added a commit that referenced this pull request Jul 30, 2025
The argument `max_request_retries` of `BasicCrawler` previously included
also the initial request in retries. Now it counts only the retries.

- Closes: #1326
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

max_request_retries should not include the original request
3 participants