Add `scale_warmup` argument to schedulers #1268

hanlint · 2022-07-08T14:49:46Z

Based on request from @codestar12 , this PR adds the option to also scale the warmup period for our schedulers with the scale_schedule_ratio.

For example:

    from composer.optim.scheduler import MultiStepWithWarmupScheduler

    scheduler = MultiStepWithWarmupScheduler(
        milestones=["10ep", "20ep"],
        t_warmup="4ep",
        scale_warmup=True,
    )

With scale_schedule_ratio=0.5, this scheduler will warmup for 2 epochs, then step at 5 and 10 epochs. scale_warmup defaults to False to preserve the current behavior.

During implementation, I observed an unintuitive default behavior with our warmup. Suppose we have max_duration=100ba. If we define a scheduler with a warmup in batches (e.g. t_warmup=10ba), and apply an ssr, the warmup period will not be scaled, per our default behavior. However, if the warmup was defined in duration (e.g. t_warmup=0.1dur), and apply an ssr, then the warmup period will always be scaled (as we scale max_duration in the trainer beforehand). This PR respects that behavior, but perhaps we should attempt a fix separately.

Closes https://mosaicml.atlassian.net/browse/CO-668

ravi-mosaicml · 2022-07-11T16:29:44Z

CC @jbloxham since he originally wrote the schedulers
CC @jfrankle for whether schedulers should be scaled for warmup -- I was under the impression that they shouldn't, but I don't know the research background here. Though I suppose if it's a non-default option, then that's fine.

ravi-mosaicml

LGTM. Can you create a JIRA to track the fix for scale_warmup=False when the warmup period is specified in dur?

composer/optim/scheduler.py

hanlint added 2 commits July 8, 2022 07:43

add scale_warmup to schedulers

8c8aac2

add to scheduler docs

05cb72c

hanlint requested a review from codestar12 July 8, 2022 14:49

fix docstrings

9f96978

hanlint requested a review from ravi-mosaicml July 8, 2022 20:14

ravi-mosaicml requested a review from jfrankle July 11, 2022 16:29

ravi-mosaicml approved these changes Jul 11, 2022

View reviewed changes

composer/optim/scheduler.py Show resolved Hide resolved

Merge branch 'dev' into hanlin/scale_warmup

b6b5d7f

hanlint merged commit 5b9f26f into mosaicml:dev Jul 12, 2022

ravi-mosaicml pushed a commit that referenced this pull request Jul 16, 2022

Add scale_warmup argument to schedulers (#1268)

f35ba92

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `scale_warmup` argument to schedulers #1268

Add `scale_warmup` argument to schedulers #1268

Uh oh!

hanlint commented Jul 8, 2022 •

edited by ravi-mosaicml

Loading

Uh oh!

ravi-mosaicml commented Jul 11, 2022 •

edited

Loading

Uh oh!

ravi-mosaicml left a comment

Uh oh!

Uh oh!

Uh oh!

Add scale_warmup argument to schedulers #1268

Add scale_warmup argument to schedulers #1268

Uh oh!

Conversation

hanlint commented Jul 8, 2022 • edited by ravi-mosaicml Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ravi-mosaicml commented Jul 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ravi-mosaicml left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Add `scale_warmup` argument to schedulers #1268

Add `scale_warmup` argument to schedulers #1268

hanlint commented Jul 8, 2022 •

edited by ravi-mosaicml

Loading

ravi-mosaicml commented Jul 11, 2022 •

edited

Loading