[JAX] Rework amax reduction over TPSP #2218

phu0ngng · 2025-09-30T15:47:41Z

Description

In this PR, we add an additional check to detect whether the input is partitioned in the sequence dimension and only trigger the amax reduction across TPSP in that case.
With this change, the using_global_amax_of_x is no longer needed.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Phuong Nguyen <[email protected]>

jberchtold-nvidia

Left one question, pending that LGTM. Thanks!

jberchtold-nvidia · 2025-09-30T17:46:02Z

transformer_engine/jax/cpp_extensions/quantization.py

+            sequence_dim = 0 if batch_sequence_transpose else 1
+            # Run AR across TPSP only when tensor-sequence is detected in the input spec
+            if amax_scope is AmaxScope.TPSP and x_spec[sequence_dim] == gmesh.tpsp_resource:
                amax = lax_paral_op(amax, jax.lax.pmax, gmesh.tpsp_resource, mesh)


Previously, we would also AR in the gmesh.tp_resource dim. To keep the previous behavior, do we still need to reduce on TP if amax_scope is TPSP but the sequence dim is TP not TPSP? Or is it okay to only reduce when tpsp is active?

We only do AG the sequence dimension before the GEMM in TPSP, thus amax reduction is needed.

If users use TP for TPSP, there are existing warnings in the GEMM op to warn users to switch to TPSP.

phu0ngng · 2025-09-30T19:39:22Z

/te-ci JAX L1

phu0ngng added 2 commits September 30, 2025 14:53

rm using_global_amax_of_x

6fc41fa

Signed-off-by: Phuong Nguyen <[email protected]>

minor fix

53edd1c

Signed-off-by: Phuong Nguyen <[email protected]>

phu0ngng requested review from mingxu1067 and jberchtold-nvidia September 30, 2025 15:47

jberchtold-nvidia approved these changes Sep 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[JAX] Rework amax reduction over TPSP #2218

[JAX] Rework amax reduction over TPSP #2218

phu0ngng commented Sep 30, 2025

Uh oh!

jberchtold-nvidia left a comment

Uh oh!

jberchtold-nvidia Sep 30, 2025

Uh oh!

phu0ngng Sep 30, 2025

Uh oh!

phu0ngng commented Sep 30, 2025

Uh oh!

Uh oh!

[JAX] Rework amax reduction over TPSP #2218

Are you sure you want to change the base?

[JAX] Rework amax reduction over TPSP #2218

Conversation

phu0ngng commented Sep 30, 2025

Description

Type of change

Checklist:

Uh oh!

jberchtold-nvidia left a comment

Choose a reason for hiding this comment

Uh oh!

jberchtold-nvidia Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

phu0ngng Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

phu0ngng commented Sep 30, 2025

Uh oh!

Uh oh!