Skip to content

Conversation

hanlint
Copy link
Contributor

@hanlint hanlint commented Jul 8, 2022

Based on request from @codestar12 , this PR adds the option to also scale the warmup period for our schedulers with the scale_schedule_ratio.

For example:

    from composer.optim.scheduler import MultiStepWithWarmupScheduler

    scheduler = MultiStepWithWarmupScheduler(
        milestones=["10ep", "20ep"],
        t_warmup="4ep",
        scale_warmup=True,
    )

With scale_schedule_ratio=0.5, this scheduler will warmup for 2 epochs, then step at 5 and 10 epochs. scale_warmup defaults to False to preserve the current behavior.

During implementation, I observed an unintuitive default behavior with our warmup. Suppose we have max_duration=100ba. If we define a scheduler with a warmup in batches (e.g. t_warmup=10ba), and apply an ssr, the warmup period will not be scaled, per our default behavior. However, if the warmup was defined in duration (e.g. t_warmup=0.1dur), and apply an ssr, then the warmup period will always be scaled (as we scale max_duration in the trainer beforehand). This PR respects that behavior, but perhaps we should attempt a fix separately.

Closes https://mosaicml.atlassian.net/browse/CO-668

@hanlint hanlint requested a review from codestar12 July 8, 2022 14:49
@hanlint hanlint requested a review from ravi-mosaicml July 8, 2022 20:14
@ravi-mosaicml ravi-mosaicml requested a review from jfrankle July 11, 2022 16:29
@ravi-mosaicml
Copy link
Contributor

ravi-mosaicml commented Jul 11, 2022

CC @jbloxham since he originally wrote the schedulers
CC @jfrankle for whether schedulers should be scaled for warmup -- I was under the impression that they shouldn't, but I don't know the research background here. Though I suppose if it's a non-default option, then that's fine.

Copy link
Contributor

@ravi-mosaicml ravi-mosaicml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Can you create a JIRA to track the fix for scale_warmup=False when the warmup period is specified in dur?

@hanlint hanlint merged commit 5b9f26f into mosaicml:dev Jul 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants