Don't init dist when world_size is 1 #311

jbloxham · 2022-01-31T21:43:06Z

The distributed process group doesn't necessarily need to be initialized when we're doing single-process training, and this can help avoid crashes in certain environments. Note that the DDP initialization code already is a no-op when the distributed process group is not initialized.

Tested by running models with various world sizes (and no explicit world size), and using a print statement to verify whether the model was the original model class or a DDP wrapper.

composer/utils/dist.py

ravi-mosaicml

LGTM

composer/utils/dist.py

Co-authored-by: ravi-mosaicml <[email protected]>

* cleanup dist init * specify node rank * rewrite _get_distributed_config_var for better error handling * change a warning to an error * woops * dummy commit to trigger jenkins * Update composer/utils/dist.py Co-authored-by: ravi-mosaicml <[email protected]> * woops * remove print statement Co-authored-by: ravi-mosaicml <[email protected]>

cleanup dist init

28c83b1

jbloxham commented Jan 31, 2022

View reviewed changes

composer/utils/dist.py Show resolved Hide resolved

jbloxham linked an issue Jan 31, 2022 that may be closed by this pull request

Do not run DDP for users that are running on a single GPU` #278

Closed

jbloxham requested a review from ravi-mosaicml January 31, 2022 21:51

ravi-mosaicml approved these changes Jan 31, 2022

View reviewed changes

composer/utils/dist.py Show resolved Hide resolved

composer/utils/dist.py Outdated Show resolved Hide resolved

root added 4 commits January 31, 2022 22:07

specify node rank

4f6cd25

rewrite _get_distributed_config_var for better error handling

6c9e1dd

change a warning to an error

2876b02

woops

9094a88

jbloxham marked this pull request as ready for review January 31, 2022 22:31

dummy commit to trigger jenkins

6de1a5a

ravi-mosaicml reviewed Jan 31, 2022

View reviewed changes

composer/utils/dist.py Outdated Show resolved Hide resolved

Jamie Bloxham and others added 4 commits January 31, 2022 14:39

Update composer/utils/dist.py

5e9a3ac

Co-authored-by: ravi-mosaicml <[email protected]>

Merge branch 'dev' into avoid-ddp-for-single-process

a580265

woops

94266cb

remove print statement

dee8d06

jbloxham merged commit 00fbf95 into mosaicml:dev Jan 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Don't init dist when world_size is 1 #311

Don't init dist when world_size is 1 #311

Uh oh!

jbloxham commented Jan 31, 2022 •

edited

Loading

Uh oh!

Uh oh!

ravi-mosaicml left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Don't init dist when world_size is 1 #311

Don't init dist when world_size is 1 #311

Uh oh!

Conversation

jbloxham commented Jan 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ravi-mosaicml left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jbloxham commented Jan 31, 2022 •

edited

Loading