Skip to content

Conversation

jbloxham
Copy link
Contributor

@jbloxham jbloxham commented Jan 5, 2022

The distributed runtime and the DDP engine are distinct entities, but our code has been treating them almost as synonyms. This is already causing some confusion in parts of the DeepSpeed integration, and it will only get worse if we experiment with other parallelism techniques like model and pipeline parallelism. The purpose of this PR is to separate out DDP-specific code from anything that just deals with the distributed runtime in general.

This is purely a refactor of something that was making me unhappy.

@jbloxham jbloxham marked this pull request as ready for review January 5, 2022 23:01
Copy link
Contributor

@ravi-mosaicml ravi-mosaicml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for doing this!

@jbloxham jbloxham merged commit 1566ce2 into mosaicml:dev Jan 6, 2022
coryMosaicML pushed a commit to coryMosaicML/composer that referenced this pull request Feb 23, 2022
* dist, not ddp

* simplify ClosureGradScaler

* formatting

* formatting and more fixes

* that did not save

* small fixes

* dont need to worry about circular dependencies any longer

* dumb pyright fix

* woops
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants