Skip to content

Conversation

npuichigo
Copy link

@npuichigo npuichigo commented Jul 22, 2025

Summary

Testing Done

  • Hardware Type:
  • run make test to ensure correctness
  • run make checkstyle to ensure code style
  • run make test-convergence to ensure convergence

@Tcc0403
Copy link
Collaborator

Tcc0403 commented Jul 23, 2025

The reason why we don't want to pass kwargs is because **kwargs also contains FlashAttentionKwrags, resulting in issue #650.
More detailed explanation #651 (comment)

@npuichigo
Copy link
Author

So maybe we should pass through ce_weight.

@Tcc0403
Copy link
Collaborator

Tcc0403 commented Jul 24, 2025

Sure, it should be fine to pass through other arguments, such as ce_weight, label_smoothing, lse_square_scale, etc.
However, they are not natively supported in hf transformers, I'm not sure about the namings for these liger flce specified kwargs.
https://github.com/huggingface/transformers/blob/947a37e8f5bc50bc0e9a77c0d16b038adcb056d0/src/transformers/utils/generic.py#L858

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants