Dataloader Upgrades #114

ravi-mosaicml · 2021-11-30T21:54:10Z

Before, the dataloder_spec, batch_size, and dataloader_hparams were passed as areguments into the trainer. Now, the trainer is initialized with a dataloader (or a dataloader, split_fn, and preprocessing_fn tuple). This change makes the DataloaderSpec optional and hidden to the user for simple datasets that do not require custom preprocessing or split functions.
Removed dataloader_to_device and replaced it with explicit calls in the training loop to 1) move data onto the device, and 2) execute the preprocessing fn. The preprocessing fn is renamed to device transformation fn. Removed the option to execute the device transformation fn in a cuda stream, since that did not add any performance improvement. When using memory pinning, the batch_to_device should be a no-op, since the dataloader would have already moved the data onto the GPU.

TODO:

Regression test on resnet base to ensure no throughput or accuracy degredations

1. Before,the `dataloder_spec`, `batch_size`, and `dataloader_hparams` were passed as areguments into the trainer. Now, the trainer is initialized with a dataloader (or a dataloader, split_fn, and preprocessing_fn tuple). This change makes the `DataloaderSpec` optional and hidden to the user for simple datasets that do not require custom preprocessing or split functions. 2. Removed `dataloader_to_device` and replaced it with explicit calls in the training loop to 1) move data onto the device, and 2) execute the preprocessing fn. The preprocessing fn is renamed to device transformation fn. Removed the option to execute the device transformation fn in a cuda stream, since that did not add any performance improvement. When using memory pinning, the `batch_to_device` should be a no-op, since the dataloader would have already moved the data onto the GPU. TODO: - [ ] Regression test on resnet base to ensure no throughput or accuracy degredations

jbloxham

This looks good - I'm very happy to see DataloaderSpec losing favor. The only thing I wonder is whether split_fn and device_transform_fn could be removed. The former I think is unnecessary if we just load N microbatches instead of 1 batch, and the latter could be replaced with explicit augmentations?

composer/datasets/dataloader.py

composer/models/gpt2/scaling_laws_generator.py

ravi-mosaicml · 2021-12-03T00:47:18Z

Manually tested on a gpu instance. Throughput for a 2-wide box was 1290.

abhi-mosaic · 2021-12-03T00:47:24Z

I'm getting a bit confused as to how the train_batch_size is computed / saved, just to clarify:

Trainer no longer knows what the batch size is at init, instead it looks at its device's dataloader, and the world size, and computes train_batch_size, and then creates State with this value
TrainerHparams, which has a field for train_batch_size, will carefully create each of the device dataloaders such that they have device batch sizes of train_batch_size / world_size, such that when they are re-interpreted by Trainer, it will be computed correctly

Also, can we rename total_batch_size -> train_batch_size? It's always weirded me out

abhi-mosaic

Discussed with @ravi-mosaicml and I think this looks good to me pending throughput sanity checks on ImageNet.

1. Before, the `dataloder_spec`, `batch_size`, and `dataloader_hparams` were passed as areguments into the trainer. Now, the trainer is initialized with a dataloader (or a dataloader, split_fn, and preprocessing_fn tuple). This change makes the `DataloaderSpec` optional and hidden to the user for simple datasets that do not require custom preprocessing or split functions. 2. Removed `dataloader_to_device` and replaced it with explicit calls in the training loop to 1) move data onto the device, and 2) execute the preprocessing fn. The preprocessing fn is renamed to device transformation fn. Removed the option to execute the device transformation fn in a cuda stream, since that did not add any performance improvement. When using memory pinning, the `batch_to_device` should be a no-op, since the dataloader would have already moved the data onto the GPU.

ravi-mosaicml requested review from jbloxham, ajaysaini725, anisehsani and Averylamp November 30, 2021 21:54

ravi-mosaicml added 10 commits November 30, 2021 17:44

Fixes

17e0a10

Removed prefetching in cuda streams

df6e9ad

Added missing newline

5d07292

Fixed docstrings

5d852ff

Merge branch 'dev' into ravi/dataloaders_in_trainer

f54b545

Fixed formatting

4ff04c2

Updated dataset docs to reflect removed dataloader spec

e03e1e7

Fixed tests

467d586

Fixed formatting

30cd7b8

Removed prefetch

15a5582

jbloxham reviewed Dec 2, 2021

View reviewed changes

composer/datasets/dataloader.py Show resolved Hide resolved

composer/models/gpt2/scaling_laws_generator.py Show resolved Hide resolved

hanlint added the release label Dec 2, 2021

hanlint mentioned this pull request Dec 2, 2021

Add Tutorial for custom models #127

Closed

ravi-mosaicml mentioned this pull request Dec 2, 2021

Multiple Evaluator Datasets #120

Merged

ravi-mosaicml requested review from jbloxham and abhi-mosaic December 2, 2021 19:40

jbloxham approved these changes Dec 3, 2021

View reviewed changes

ravi-mosaicml added 2 commits December 3, 2021 00:11

Merge branch 'dev' into ravi/dataloaders_in_trainer

53cc738

Create samplers before ddp

80f88d5

Readability

6e1a972

abhi-mosaic approved these changes Dec 3, 2021

View reviewed changes

ravi-mosaicml merged commit 686aab9 into dev Dec 3, 2021

ravi-mosaicml deleted the ravi/dataloaders_in_trainer branch December 3, 2021 01:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dataloader Upgrades #114

Dataloader Upgrades #114

Uh oh!

ravi-mosaicml commented Nov 30, 2021 •

edited by A-Jacobson

Loading

Uh oh!

jbloxham left a comment

Uh oh!

Uh oh!

Uh oh!

ravi-mosaicml commented Dec 3, 2021

Uh oh!

abhi-mosaic commented Dec 3, 2021 •

edited

Loading

Uh oh!

abhi-mosaic left a comment

Uh oh!

Uh oh!

Dataloader Upgrades #114

Dataloader Upgrades #114

Uh oh!

Conversation

ravi-mosaicml commented Nov 30, 2021 • edited by A-Jacobson Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbloxham left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ravi-mosaicml commented Dec 3, 2021

Uh oh!

abhi-mosaic commented Dec 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abhi-mosaic left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ravi-mosaicml commented Nov 30, 2021 •

edited by A-Jacobson

Loading

abhi-mosaic commented Dec 3, 2021 •

edited

Loading