-
Notifications
You must be signed in to change notification settings - Fork 455
Dataloader Upgrades #114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataloader Upgrades #114
Conversation
1. Before,the `dataloder_spec`, `batch_size`, and `dataloader_hparams` were passed as areguments into the trainer. Now, the trainer is initialized with a dataloader (or a dataloader, split_fn, and preprocessing_fn tuple). This change makes the `DataloaderSpec` optional and hidden to the user for simple datasets that do not require custom preprocessing or split functions. 2. Removed `dataloader_to_device` and replaced it with explicit calls in the training loop to 1) move data onto the device, and 2) execute the preprocessing fn. The preprocessing fn is renamed to device transformation fn. Removed the option to execute the device transformation fn in a cuda stream, since that did not add any performance improvement. When using memory pinning, the `batch_to_device` should be a no-op, since the dataloader would have already moved the data onto the GPU. TODO: - [ ] Regression test on resnet base to ensure no throughput or accuracy degredations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good - I'm very happy to see DataloaderSpec
losing favor. The only thing I wonder is whether split_fn
and device_transform_fn
could be removed. The former I think is unnecessary if we just load N
microbatches instead of 1 batch, and the latter could be replaced with explicit augmentations?
Manually tested on a gpu instance. Throughput for a 2-wide box was 1290. |
I'm getting a bit confused as to how the
Also, can we rename |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed with @ravi-mosaicml and I think this looks good to me pending throughput sanity checks on ImageNet.
1. Before, the `dataloder_spec`, `batch_size`, and `dataloader_hparams` were passed as areguments into the trainer. Now, the trainer is initialized with a dataloader (or a dataloader, split_fn, and preprocessing_fn tuple). This change makes the `DataloaderSpec` optional and hidden to the user for simple datasets that do not require custom preprocessing or split functions. 2. Removed `dataloader_to_device` and replaced it with explicit calls in the training loop to 1) move data onto the device, and 2) execute the preprocessing fn. The preprocessing fn is renamed to device transformation fn. Removed the option to execute the device transformation fn in a cuda stream, since that did not add any performance improvement. When using memory pinning, the `batch_to_device` should be a no-op, since the dataloader would have already moved the data onto the GPU.
Before, the
dataloder_spec
,batch_size
, anddataloader_hparams
were passed as areguments into the trainer. Now, the trainer is initialized with a dataloader (or a dataloader, split_fn, and preprocessing_fn tuple). This change makes theDataloaderSpec
optional and hidden to the user for simple datasets that do not require custom preprocessing or split functions.Removed
dataloader_to_device
and replaced it with explicit calls in the training loop to 1) move data onto the device, and 2) execute the preprocessing fn. The preprocessing fn is renamed to device transformation fn. Removed the option to execute the device transformation fn in a cuda stream, since that did not add any performance improvement. When using memory pinning, thebatch_to_device
should be a no-op, since the dataloader would have already moved the data onto the GPU.TODO: