Fix return_bias option in LayerNormLinear and LayerNormMLP #1569

ptrendx · 2025-03-13T00:22:20Z

Description

return_bias option was ignored silently in LayerNormLinear and LayerNormMLP after the 2.0 refactor.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Change A
Change B

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Przemek Tredak <[email protected]>

ptrendx · 2025-03-13T00:22:30Z

/te-ci pytorch

for more information, see https://pre-commit.ci

Signed-off-by: Przemek Tredak <[email protected]>

ptrendx · 2025-03-13T19:25:13Z

/te-ci pytorch

timmoon10

Looks correct, especially as a quick bugfix.

return_bias is spaghetti code, so I'll summarize my understanding for future reference. Mcore performs some kernel fusions like bias-dropout-add and bias-add, which involve separating the bias compute from the linear GEMM. For an example, see this Transformer layer residual connection:
https://github.com/NVIDIA/Megatron-LM/blob/34013dbd37e94e17f1434d71b6ce9705e0acf6ca/megatron/core/transformer/transformer_layer.py#L451-L453
However, Mcore would like to stay consistent with PyTorch's convention of including the bias with the linear module (I presume for checkpointing). return_bias tells TE to not actually apply the bias, but to return it for use by Mcore. For some reason we still pass the bias into the autograd function, but then have a bunch of checks everywhere so that it isn't ever used in the forward or backward pass. In the future we should just pass bias=None into the autograd function if return_bias=True, so we can remove all the redundant logic with use_bias=False.

tests/pytorch/test_numerics.py

Signed-off-by: Przemek Tredak <[email protected]>

for more information, see https://pre-commit.ci

ptrendx · 2025-03-15T00:06:56Z

/te-ci pytorch

timmoon10

Moving the return_bias out of the autograd function is a big improvement.

tests/pytorch/test_numerics.py

This reverts commit 2e20a92. Signed-off-by: Przemek Tredak <[email protected]>

ksivaman

LGTM

ptrendx · 2025-03-17T18:15:30Z

/te-ci pytorch

Co-authored-by: Tim Moon <[email protected]> Signed-off-by: Przemyslaw Tredak <[email protected]>

Signed-off-by: Przemek Tredak <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: Przemek Tredak <[email protected]>

ptrendx · 2025-03-18T18:44:43Z

/te-ci pytorch

ptrendx · 2025-03-18T19:33:46Z

/te-ci jax

ptrendx · 2025-03-18T23:27:45Z

The CI failures are the expected failures coming from paged attention with cuDNN 9.8. Merging.

Marks101 · 2025-04-07T15:54:22Z

Hi @ptrendx and @ksivaman, could you maybe mention this in the known issues for version 2.1 and 2.0? We have been falling for this and it caused some trouble 🙈

* Do not apply bias when apply_bias is False Signed-off-by: Przemek Tredak <[email protected]> * Bwd fix for LNMLP and tests Signed-off-by: Przemek Tredak <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix for the dbias calculation Signed-off-by: Przemek Tredak <[email protected]> * Improve tests and cleaning the logic Signed-off-by: Przemek Tredak <[email protected]> * Tightened test tolerances a little Signed-off-by: Przemek Tredak <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "Tightened test tolerances a little" This reverts commit 2e20a92. Signed-off-by: Przemek Tredak <[email protected]> * Update tests/pytorch/test_numerics.py Co-authored-by: Tim Moon <[email protected]> Signed-off-by: Przemyslaw Tredak <[email protected]> * Fix the Gelu Aux type Signed-off-by: Przemek Tredak <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove use_fc1_bias option Signed-off-by: Przemek Tredak <[email protected]> --------- Signed-off-by: Przemek Tredak <[email protected]> Signed-off-by: Przemyslaw Tredak <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Tim Moon <[email protected]>

ptrendx · 2025-04-10T19:04:52Z

Hi @Marks101, my apology about that :-(. I just added the note about it to release notes in 2.0/2.1 on GitHub.

pggPL · 2025-08-06T09:25:23Z

@ptrendx I noticed "/opt/transformerengine/tests/pytorch/test_numerics.py:1147: PytestCollectionWarning: cannot collect test class 'TestReturnBiasModule' because it has a init constructor (from: tests/pytorch/test_numerics.py)" in the test logs.

ptrendx added 2 commits March 12, 2025 16:59

Do not apply bias when apply_bias is False

2b260d5

Signed-off-by: Przemek Tredak <[email protected]>

Bwd fix for LNMLP and tests

14d73b6

Signed-off-by: Przemek Tredak <[email protected]>

pre-commit-ci bot and others added 2 commits March 13, 2025 00:22

[pre-commit.ci] auto fixes from pre-commit.com hooks

cea3c84

for more information, see https://pre-commit.ci

Fix for the dbias calculation

412c39e

Signed-off-by: Przemek Tredak <[email protected]>

ptrendx requested review from ksivaman and timmoon10 March 13, 2025 21:24

ptrendx marked this pull request as ready for review March 13, 2025 21:27

timmoon10 approved these changes Mar 13, 2025

View reviewed changes

tests/pytorch/test_numerics.py Outdated Show resolved Hide resolved

ptrendx and others added 3 commits March 14, 2025 16:49

Improve tests and cleaning the logic

4e219be

Signed-off-by: Przemek Tredak <[email protected]>

Tightened test tolerances a little

2e20a92

Signed-off-by: Przemek Tredak <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

54ba61a

for more information, see https://pre-commit.ci

timmoon10 reviewed Mar 15, 2025

View reviewed changes

tests/pytorch/test_numerics.py Outdated Show resolved Hide resolved

Revert "Tightened test tolerances a little"

0b6e1ae

This reverts commit 2e20a92. Signed-off-by: Przemek Tredak <[email protected]>

ksivaman approved these changes Mar 17, 2025

View reviewed changes

ptrendx added the 2.2.0 label Mar 17, 2025

ptrendx and others added 4 commits March 17, 2025 11:16

Update tests/pytorch/test_numerics.py

e25403e

Co-authored-by: Tim Moon <[email protected]> Signed-off-by: Przemyslaw Tredak <[email protected]>

Fix the Gelu Aux type

c67526b

Signed-off-by: Przemek Tredak <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

3f179ed

for more information, see https://pre-commit.ci

Remove use_fc1_bias option

d63e823

Signed-off-by: Przemek Tredak <[email protected]>

ptrendx merged commit 99f4067 into NVIDIA:main Mar 18, 2025
18 of 23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix return_bias option in LayerNormLinear and LayerNormMLP #1569

Fix return_bias option in LayerNormLinear and LayerNormMLP #1569

Uh oh!

ptrendx commented Mar 13, 2025

Uh oh!

ptrendx commented Mar 13, 2025

Uh oh!

ptrendx commented Mar 13, 2025

Uh oh!

timmoon10 left a comment

Uh oh!

Uh oh!

ptrendx commented Mar 15, 2025

Uh oh!

timmoon10 left a comment

Uh oh!

Uh oh!

ksivaman left a comment

Uh oh!

ptrendx commented Mar 17, 2025

Uh oh!

ptrendx commented Mar 18, 2025

Uh oh!

ptrendx commented Mar 18, 2025

Uh oh!

ptrendx commented Mar 18, 2025

Uh oh!

Uh oh!

Marks101 commented Apr 7, 2025

Uh oh!

ptrendx commented Apr 10, 2025

Uh oh!

pggPL commented Aug 6, 2025

Uh oh!

Uh oh!

Fix return_bias option in LayerNormLinear and LayerNormMLP #1569

Fix return_bias option in LayerNormLinear and LayerNormMLP #1569

Uh oh!

Conversation

ptrendx commented Mar 13, 2025

Description

Type of change

Changes

Checklist:

Uh oh!

ptrendx commented Mar 13, 2025

Uh oh!

ptrendx commented Mar 13, 2025

Uh oh!

timmoon10 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ptrendx commented Mar 15, 2025

Uh oh!

timmoon10 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ksivaman left a comment

Choose a reason for hiding this comment

Uh oh!

ptrendx commented Mar 17, 2025

Uh oh!

ptrendx commented Mar 18, 2025

Uh oh!

ptrendx commented Mar 18, 2025

Uh oh!

ptrendx commented Mar 18, 2025

Uh oh!

Uh oh!

Marks101 commented Apr 7, 2025

Uh oh!

ptrendx commented Apr 10, 2025

Uh oh!

pggPL commented Aug 6, 2025

Uh oh!

Uh oh!