Add support of Falcon-H1 models for liger kernels #874

puneeshkhanna · 2025-09-03T11:38:55Z

Summary

This PR enables ROPE, rmsnorm and cross entropy liger kernels for Falcon-H1 models

Falcon H1 models -
https://huggingface.co/collections/tiiuae/falcon-h1-6819f2795bc406da60fab8df

Support for fused cross entropy and swiglu to be added later.

Testing Done

Verified fine tuning Falcon H1 models with and without enabling liger kernels and loss plot matches. And gives speedup as well as memory savings.

Hardware Type: H100
run make test to ensure correctness
run make checkstyle to ensure code style
run make test-convergence to ensure convergence

This PR enables ROPE, rmsnorm and cross entropy liger kernels for Falcon-H1 models. Support for fused cross entropy and swiglu to be added later.

shimizust

If can't implement swiglu/flce in this impl, can you create a new issue to track this?

shimizust · 2025-09-21T19:44:36Z

src/liger_kernel/transformers/monkey_patch.py

    "smollm3": apply_liger_kernel_to_smollm3,
    "phi3": apply_liger_kernel_to_phi3,
    "paligemma": apply_liger_kernel_to_paligemma,
+    "falcon_h1": apply_liger_kernel_to_falcon_h1,


Can you add this to src/liger_kernel/transformers/init.py?

shimizust · 2025-09-21T19:47:44Z

src/liger_kernel/transformers/monkey_patch.py

                    _patch_rms_norm_module(decoder_layer.post_mlp_layernorm)


+def apply_liger_kernel_to_falcon_h1(


Can you add corresponding convergence and monkey patch tests?

shimizust · 2025-09-21T20:03:33Z

src/liger_kernel/transformers/monkey_patch.py

+        cross_entropy (bool): Whether to apply Liger's cross entropy loss. Default is False.
+        fused_linear_cross_entropy (bool):
+            Whether to apply Liger's fused linear cross entropy loss. Default is True.
+            `cross_entropy` and `fused_linear_cross_entropy` cannot both be True.
+            If `fused_linear_cross_entropy` is True, the logits will not be materialized but more memory efficient.
+        rms_norm (bool): Whether to apply Liger's RMSNorm. Default is True.
+        swiglu (bool): Whether to apply Liger's SwiGLU MLP. Default is True.
+        model (PreTrainedModel): The model instance to apply Liger kernels to, if the model has already been


Set correct defaults for flce and swiglu if not implemented

shimizust · 2025-09-21T20:05:29Z

src/liger_kernel/transformers/monkey_patch.py

+    if swiglu:
+        modeling_falcon_h1.FalconH1MLP = LigerSwiGLUMLP
+
+    if cross_entropy:
+        if transformer_version >= version.parse(SUPPORTED_TRANSFORMER_VERSION):
+            logger.info("Apply liger cross entropy")
+            from transformers.loss.loss_utils import nn
+
+            nn.functional.cross_entropy = liger_cross_entropy
+        else:
+            logger.warning(TRANSFORMER_DEPRECATION_WARNING)
+            modeling_falcon_h1.CrossEntropyLoss = LigerCrossEntropyLoss
+
+    # TODO: To be enabled
+    if fused_linear_cross_entropy:
+        if transformer_version >= version.parse(SUPPORTED_TRANSFORMER_VERSION):
+            modeling_falcon_h1.FalconH1ForCausalLM.forward = llama_lce_forward
+        else:  # if version < 4.46.1
+            logger.warning(TRANSFORMER_DEPRECATION_WARNING)
+            modeling_falcon_h1.FalconH1ForCausalLM.forward = llama_lce_forward_deprecated
+


Can you raise NotImplementedError in the case of swiglu and flce are set to true? Right now we're patching with incorrect modules (also in the case below where model instance exists)

puneeshkhanna · 2025-09-25T17:43:48Z

Thanks for all the comments @shimizust . Will address soon

puneeshkhanna and others added 3 commits September 3, 2025 15:32

Add support of Falcon-H1 models for liger kernels

61f8b6e

This PR enables ROPE, rmsnorm and cross entropy liger kernels for Falcon-H1 models. Support for fused cross entropy and swiglu to be added later.

Fix make style

b622d23

Merge branch 'main' into feat/falcon_h1_support

1f6f0b3

shimizust self-assigned this Sep 16, 2025

shimizust reviewed Sep 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support of Falcon-H1 models for liger kernels #874

Add support of Falcon-H1 models for liger kernels #874

puneeshkhanna commented Sep 3, 2025 •

edited

Loading

Uh oh!

shimizust left a comment

Uh oh!

shimizust Sep 21, 2025

Uh oh!

shimizust Sep 21, 2025

Uh oh!

shimizust Sep 21, 2025

Uh oh!

shimizust Sep 21, 2025

Uh oh!

puneeshkhanna commented Sep 25, 2025

Uh oh!

Uh oh!

		_patch_rms_norm_module(decoder_layer.post_mlp_layernorm)


		def apply_liger_kernel_to_falcon_h1(

Add support of Falcon-H1 models for liger kernels #874

Are you sure you want to change the base?

Add support of Falcon-H1 models for liger kernels #874

Conversation

puneeshkhanna commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing Done

Uh oh!

shimizust left a comment

Choose a reason for hiding this comment

Uh oh!

shimizust Sep 21, 2025

Choose a reason for hiding this comment

Uh oh!

shimizust Sep 21, 2025

Choose a reason for hiding this comment

Uh oh!

shimizust Sep 21, 2025

Choose a reason for hiding this comment

Uh oh!

shimizust Sep 21, 2025

Choose a reason for hiding this comment

Uh oh!

puneeshkhanna commented Sep 25, 2025

Uh oh!

Uh oh!

puneeshkhanna commented Sep 3, 2025 •

edited

Loading