You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TODO(Moin): Fill this in and add comments as appropriate to describe what's happening.
40
-
41
44
```python
42
45
from composer.trainer import Trainer
46
+
from composer.algorithms import SeqLengthWarmup
43
47
44
48
trainer = Trainer(model=model,
45
49
train_dataloader=train_dataloader,
46
50
max_duration='1ep',
47
-
algorithms=[])
51
+
algorithms=[SeqLengthWarmup()])
48
52
49
53
trainer.fit()
50
54
```
@@ -55,9 +59,7 @@ We implement this as a pre-processing step during the forward pass when training
55
59
56
60
## Suggested Hyperparameters
57
61
58
-
We found that running Sequence Length Warmup for 30% of training (i.e., setting `duration=0.3`) provided the largest speedup that could still maintain full model quality on GPT2-52M.
59
-
60
-
TODO(Moin): Provide insights into the other hyperparameter choices.
62
+
We found that running Sequence Length Warmup for 30% of training (i.e., setting `duration=0.3`) provided the largest speedup that could still maintain full model quality on GPT-2 125M. We also recommend to always ensure that the sequence length is a multiple of eight in order to take advantage of hardware acceleration, such as Tensor Cores.
61
63
62
64
## Technical Details
63
65
@@ -87,8 +89,6 @@ There are two options for doing so:
87
89
* Truncating the sentence, discarding everything beyond the desired sequence length.
88
90
* Segmenting the sentence, breaking it up into segments of the desired sequence lenght and making all segments into separate trianing examples.
89
91
90
-
Jonathan to pick up here.
91
-
92
92
## Attribution
93
93
94
94
[*Curriculum Learning: A Regularization Method for Efficient and Stable Billion-Scale GPT Model Pre-Training*](https://arxiv.org/abs/2108.06084) by Conglong Li, Minjia Zhang, and Yuxiong He. Posted to arXiv in 2021.
0 commit comments