-
Notifications
You must be signed in to change notification settings - Fork 26
Improve Downloader RAM usage and add optional replace file boolean #54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,18 +1,62 @@ | ||
import requests | ||
import math | ||
import os | ||
import requests | ||
from requests.adapters import HTTPAdapter | ||
from requests.packages.urllib3.util.retry import Retry | ||
from frameioclient.utils import calculate_hash | ||
|
||
|
||
class FrameioDownloader(object): | ||
def __init__(self, asset, download_folder): | ||
def __init__(self, asset, download_folder, replace): | ||
jhodges10 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
self.asset = asset | ||
self.download_folder = download_folder | ||
self.replace = replace | ||
self.attempts = 0 | ||
self.retry_limit = 3 | ||
|
||
self.http_retry_strategy = Retry( | ||
total=3, | ||
backoff_factor=1, | ||
status_forcelist=[408, 500, 502, 503, 504], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Where did you get this list of error codes? For reference, S3 will only throw a couple of errors we should care about (almost all of which are 4xx not 5xx). https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html#ErrorCodeList There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My thinking was that 400, 403, 409 etc was user-related issues so they shouldn't be retried. Instead we should retry temporary server-side errors. But maybe S3 never throws those error codes anyway so that might be incorrect. What do you suggest? I am at a loss here :) |
||
method_whitelist=['GET'] | ||
) | ||
|
||
def download(self): | ||
original_filename = self.asset['name'] | ||
final_destination = os.path.join(self.download_folder, original_filename) | ||
|
||
|
||
if os.path.isfile(final_destination) and not self.replace: | ||
try: | ||
raise FileExistsError | ||
except NameError: | ||
raise OSError('File exists') # Python < 3.3 | ||
|
||
adapter = HTTPAdapter(max_retries=self.http_retry_strategy) | ||
http = requests.Session() | ||
http.mount('https://', adapter) | ||
|
||
url = self.asset['original'] | ||
r = requests.get(url) | ||
|
||
open(final_destination, 'wb').write(r.content) | ||
|
||
|
||
try: | ||
original_checksum = self.asset['checksums']['xx_hash'] | ||
except (TypeError, KeyError): | ||
original_checksum = None | ||
|
||
while self.attempts < self.retry_limit: | ||
r = http.request('GET', url, stream=True) | ||
|
||
with open(final_destination, 'wb') as handle: | ||
try: | ||
for chunk in r.iter_content(chunk_size=4096): | ||
if chunk: | ||
handle.write(chunk) | ||
except requests.exceptions.ChunkedEncodingError: | ||
self.attempts += 1 | ||
continue | ||
|
||
if not original_checksum: | ||
break | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added this for older files and edge cases without FIO checksum. So it's not really an error, just download the file and skip verification. |
||
|
||
if calculate_hash(final_destination) == original_checksum: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd like to return the output path as an absolute path if it's a successful download. I think that's a really useful thing! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure! Do you think we should do that for the case above as well where the file isn't verified? My thought was that verification is handled under the hood and we don't inform the calling function of whether it's done or not. |
||
break | ||
|
||
self.attempts += 1 |
Uh oh!
There was an error while loading. Please reload this page.