-
Notifications
You must be signed in to change notification settings - Fork 26
Improve Downloader RAM usage and add optional replace file boolean #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Downloader RAM usage and add optional replace file boolean #54
Conversation
frameioclient/download.py
Outdated
self.retry_strategy = Retry( | ||
total=3, | ||
backoff_factor=1, | ||
status_forcelist=[429], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this, I think you'll want to add more statuses so that it catches on more than just a rate limit error because the expected failure here is not very likely to be a 429
. I also think that there might be other ways of ensuring that we retry a failed download.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok! Might have been a little quick here. Testing some other error codes.
Do you think we should add a xxhash check here and retry the download if it fails? Because then I can work on a solution to retry 3x times for example if hashes don't match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not a bad idea! If you want to add it, feel free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing the xxhash on download is much easier than on upload, because of the delay on the upload side that would require putting the QC into a queue and then managing that state and creating a background thread to check on it continuously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the latest commit 😉 and yes uploads are a bit trickier!
self.http_retry_strategy = Retry( | ||
total=3, | ||
backoff_factor=1, | ||
status_forcelist=[408, 500, 502, 503, 504], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did you get this list of error codes? For reference, S3 will only throw a couple of errors we should care about (almost all of which are 4xx not 5xx).
https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html#ErrorCodeList
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking was that 400, 403, 409 etc was user-related issues so they shouldn't be retried. Instead we should retry temporary server-side errors. But maybe S3 never throws those error codes anyway so that might be incorrect.
What do you suggest? I am at a loss here :)
continue | ||
|
||
if not original_checksum: | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we break
here, what does this error look like in the console/logs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this for older files and edge cases without FIO checksum. So it's not really an error, just download the file and skip verification.
if not original_checksum: | ||
break | ||
|
||
if calculate_hash(final_destination) == original_checksum: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to return the output path as an absolute path if it's a successful download. I think that's a really useful thing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Do you think we should do that for the case above as well where the file isn't verified? My thought was that verification is handled under the hood and we don't inform the calling function of whether it's done or not.
This is going to end up getting rolled into #73 |
No description provided.