Add scale_warmup
argument to schedulers
#1268
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Based on request from @codestar12 , this PR adds the option to also scale the warmup period for our schedulers with the
scale_schedule_ratio
.For example:
With
scale_schedule_ratio=0.5
, this scheduler will warmup for 2 epochs, then step at 5 and 10 epochs.scale_warmup
defaults toFalse
to preserve the current behavior.During implementation, I observed an unintuitive default behavior with our warmup. Suppose we have
max_duration=100ba
. If we define a scheduler with a warmup in batches (e.g.t_warmup=10ba
), and apply an ssr, the warmup period will not be scaled, per our default behavior. However, if the warmup was defined in duration (e.g.t_warmup=0.1dur
), and apply an ssr, then the warmup period will always be scaled (as we scalemax_duration
in the trainer beforehand). This PR respects that behavior, but perhaps we should attempt a fix separately.Closes https://mosaicml.atlassian.net/browse/CO-668