Skip to content

Conversation

Tcc0403
Copy link
Collaborator

@Tcc0403 Tcc0403 commented Apr 3, 2025

Summary

Resolves #650

**kwargs might contain FlashAttentionKwargs, so we should avoid passing it to flce directly.

This PR also adopts the changes of ForCausalLMLoss and fixed_cross_entropy from huggingface/transformers, and rename softcap to final_logit_softcapping to match the naming.

Testing Done

  • Hardware Type:
  • run make test to ensure correctness
  • run make checkstyle to ensure code style
  • run make test-convergence to ensure convergence

@Tcc0403 Tcc0403 requested a review from lancerts April 3, 2025 16:19
@Tcc0403
Copy link
Collaborator Author

Tcc0403 commented Apr 3, 2025

Investigating why gemma3 multimodal test failed

@eljandoubi should we relax the tolerance further?

target,
reduction=reduction,
ignore_index=ignore_index,
**kwargs,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good; don't understand why are we not passing kwargs to liger flce anymore

Copy link
Collaborator Author

@Tcc0403 Tcc0403 Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly because **kwargs might contain FlashAttentionKwargs, we might accidently pass them to flce when users set _attn_implementation to flash_attention-2.

We already catch required args by declaring names explicitly in function signature, so the rest is not needed. There are many features (weight, label_smoothing, z-loss, ...) in flce, and I thought they were good to have. But most of them are not supported in transformers, I just remove them for now in case future changes of transformers breaking it again.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks for the explanation

Copy link
Collaborator

@shivam15s shivam15s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@shivam15s shivam15s merged commit 7b0669b into main Apr 11, 2025
6 of 7 checks passed
@shivam15s shivam15s deleted the tcc/flce_unexpected_kwargs branch April 11, 2025 22:40
@sfc-gh-sbekman sfc-gh-sbekman mentioned this pull request Apr 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0.5.6 report bug
3 participants