-
Notifications
You must be signed in to change notification settings - Fork 512
Fix return_bias option in LayerNormLinear and LayerNormMLP #1569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Przemek Tredak <[email protected]>
Signed-off-by: Przemek Tredak <[email protected]>
/te-ci pytorch |
for more information, see https://pre-commit.ci
Signed-off-by: Przemek Tredak <[email protected]>
/te-ci pytorch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks correct, especially as a quick bugfix.
return_bias
is spaghetti code, so I'll summarize my understanding for future reference. Mcore performs some kernel fusions like bias-dropout-add and bias-add, which involve separating the bias compute from the linear GEMM. For an example, see this Transformer layer residual connection:
https://github.com/NVIDIA/Megatron-LM/blob/34013dbd37e94e17f1434d71b6ce9705e0acf6ca/megatron/core/transformer/transformer_layer.py#L451-L453
However, Mcore would like to stay consistent with PyTorch's convention of including the bias with the linear module (I presume for checkpointing). return_bias
tells TE to not actually apply the bias, but to return it for use by Mcore. For some reason we still pass the bias into the autograd function, but then have a bunch of checks everywhere so that it isn't ever used in the forward or backward pass. In the future we should just pass bias=None
into the autograd function if return_bias=True
, so we can remove all the redundant logic with use_bias=False
.
Signed-off-by: Przemek Tredak <[email protected]>
Signed-off-by: Przemek Tredak <[email protected]>
for more information, see https://pre-commit.ci
/te-ci pytorch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving the return_bias
out of the autograd function is a big improvement.
This reverts commit 2e20a92. Signed-off-by: Przemek Tredak <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/te-ci pytorch |
Co-authored-by: Tim Moon <[email protected]> Signed-off-by: Przemyslaw Tredak <[email protected]>
Signed-off-by: Przemek Tredak <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Przemek Tredak <[email protected]>
/te-ci pytorch |
/te-ci jax |
The CI failures are the expected failures coming from paged attention with cuDNN 9.8. Merging. |
* Do not apply bias when apply_bias is False Signed-off-by: Przemek Tredak <[email protected]> * Bwd fix for LNMLP and tests Signed-off-by: Przemek Tredak <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix for the dbias calculation Signed-off-by: Przemek Tredak <[email protected]> * Improve tests and cleaning the logic Signed-off-by: Przemek Tredak <[email protected]> * Tightened test tolerances a little Signed-off-by: Przemek Tredak <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "Tightened test tolerances a little" This reverts commit 2e20a92. Signed-off-by: Przemek Tredak <[email protected]> * Update tests/pytorch/test_numerics.py Co-authored-by: Tim Moon <[email protected]> Signed-off-by: Przemyslaw Tredak <[email protected]> * Fix the Gelu Aux type Signed-off-by: Przemek Tredak <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove use_fc1_bias option Signed-off-by: Przemek Tredak <[email protected]> --------- Signed-off-by: Przemek Tredak <[email protected]> Signed-off-by: Przemyslaw Tredak <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Tim Moon <[email protected]>
Hi @Marks101, my apology about that :-(. I just added the note about it to release notes in 2.0/2.1 on GitHub. |
@ptrendx I noticed "/opt/transformerengine/tests/pytorch/test_numerics.py:1147: PytestCollectionWarning: cannot collect test class 'TestReturnBiasModule' because it has a init constructor (from: tests/pytorch/test_numerics.py)" in the test logs. |
Description
return_bias option was ignored silently in LayerNormLinear and LayerNormMLP after the 2.0 refactor.
Type of change
Changes
Please list the changes introduced in this PR:
Checklist: