Skip to content

Conversation

strombergdev
Copy link
Contributor

No description provided.

@jhodges10 jhodges10 changed the title improve downloader ram usage and make replace optional Improve Downloader RAM usage and add optional replace file boolean Jul 7, 2020
self.retry_strategy = Retry(
total=3,
backoff_factor=1,
status_forcelist=[429],
Copy link
Contributor

@jhodges10 jhodges10 Jul 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this, I think you'll want to add more statuses so that it catches on more than just a rate limit error because the expected failure here is not very likely to be a 429. I also think that there might be other ways of ensuring that we retry a failed download.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok! Might have been a little quick here. Testing some other error codes.

Do you think we should add a xxhash check here and retry the download if it fails? Because then I can work on a solution to retry 3x times for example if hashes don't match.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not a bad idea! If you want to add it, feel free.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing the xxhash on download is much easier than on upload, because of the delay on the upload side that would require putting the QC into a queue and then managing that state and creating a background thread to check on it continuously.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the latest commit 😉 and yes uploads are a bit trickier!

self.http_retry_strategy = Retry(
total=3,
backoff_factor=1,
status_forcelist=[408, 500, 502, 503, 504],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did you get this list of error codes? For reference, S3 will only throw a couple of errors we should care about (almost all of which are 4xx not 5xx).

https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html#ErrorCodeList

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was that 400, 403, 409 etc was user-related issues so they shouldn't be retried. Instead we should retry temporary server-side errors. But maybe S3 never throws those error codes anyway so that might be incorrect.

What do you suggest? I am at a loss here :)

continue

if not original_checksum:
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we break here, what does this error look like in the console/logs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this for older files and edge cases without FIO checksum. So it's not really an error, just download the file and skip verification.

if not original_checksum:
break

if calculate_hash(final_destination) == original_checksum:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to return the output path as an absolute path if it's a successful download. I think that's a really useful thing!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Do you think we should do that for the case above as well where the file isn't verified? My thought was that verification is handled under the hood and we don't inform the calling function of whether it's done or not.

@subsetpark subsetpark removed their request for review January 26, 2021 14:27
@jhodges10
Copy link
Contributor

This is going to end up getting rolled into #73

@jhodges10 jhodges10 added duplicate This issue or pull request already exists enhancement New feature or request labels Jun 10, 2021
@strombergdev strombergdev closed this by deleting the head repository Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants