Flash Attention v3 #36190

hlky · 2025-02-14T08:01:42Z

What does this PR do?

Replaces #33522 to avoid conflicts and allow those using it to continue while we get it updated for #35235

Initial commit of this PR adds auxiliary code so we can discuss the core FAv3 integration.

cc @ArthurZucker

Integrate FAv3 into _flash_attention_forward/flash_attention_forward as before or create new functions?
Some models still have FlashAttention2 classes, is refactoring all models to use the new style planned? Integrate FAv3 as before or do the refactor in this PR?

Also to check:

Status of dropout, softcap etc
Status of FP8
Packaging

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2025-02-14T08:27:23Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

vasqu

Just a preheader to warn/inform you on some stuff regarding the current status of fa3:

sm80 is supported (A100 etc) (and up)
(arm64 is supported now I think, not sure if it was before)
it doesn't seem like dropout will be supported ( Dao-AILab/flash-attention#1377 )
(barebones) padding is included in hopper ( https://github.com/Dao-AILab/flash-attention/blob/main/hopper/padding.py )
seqused_(q/k) is now forced in the varlen interface ( https://github.com/Dao-AILab/flash-attention/blob/fa445ff6c215026438cca496a97242b8269aa428/hopper/flash_attn_interface.py#L566-L567 ) but tbh not sure if this was unintended ( opened an issue at Dao-AILab/flash-attention#1495 ) newest main shouldnt require it anymore
qkv packed exisits for base fa3 forward (but not the others)
softcapping should be supported now ( e.g. https://github.com/Dao-AILab/flash-attention/blob/fa445ff6c215026438cca496a97242b8269aa428/hopper/flash_attn_interface.py#L576 )
fp8 backward doesnt look like it will be added soon ( Dao-AILab/flash-attention#1420 (comment) )

vasqu · 2025-02-14T18:22:00Z

src/transformers/modeling_utils.py

+        if torch.version.cuda:
+            compute_capability = torch.cuda.get_device_capability()
+            major, _ = compute_capability
+            if major < 9:


A100 support has been recently added Dao-AILab/flash-attention#1481 (comment)

vasqu · 2025-02-14T19:01:16Z

cc @bn999 if you're interested about the progress

bn999 · 2025-02-14T21:23:10Z

@vasqu Yup, I'm following. Good stuff.

hlky · 2025-02-18T09:56:16Z

Thanks for the info @vasqu

hlky · 2025-02-24T10:16:03Z

Gentle ping @ArthurZucker

Integrate FAv3 into _flash_attention_forward/flash_attention_forward as before or create new functions?
Some models still have FlashAttention2 classes, is refactoring all models to use the new style planned? Integrate FAv3 as before or do the refactor in this PR?

jianguoz · 2025-03-14T21:23:09Z

Hi @ArthurZucker @hlky @vasqu @muellerzr , thanks for the great efforts to integrate Flash Attention 3 😁. Do we have any plans to merge this PR?

sam-h-bean · 2025-03-21T14:56:58Z

Hey quick thing here @hlky, if you have FA3 installed but not FA2 (which I believe is a valid way it is used in other repos like TE) you end up failing the is_flash_attn_2_available check and get _flash_attention_forward is not even thought the check and enable FA3 function passes. Not sure if intentional or a bug, but if intentional a better guard could help tell people both FA2 and FA3 are required?

hlky · 2025-03-21T15:06:01Z

Hi @sam-h-bean. At the time this PR was started (more specifically, the original PR #33522) pad functions were not available in FAv3, therefore FAv2 was required. As per #36190 (review) this is likely no longer required and will be updated when this PR is finished. At the moment we are waiting for comments from a core-maintainer, @ArthurZucker, regarding #36190 (comment).

ArthurZucker

Answering!

Integrate FAv3 into _flash_attention_forward/flash_attention_forward as before or create new functions?

I think if API changes are not too big we can use the same

Some models still have FlashAttention2 classes, is refactoring all models to use the new style planned? Integrate FAv3 as before or do the refactor in this PR?

would be nice to have in a separate PR!

Happy to merge as is!

ArthurZucker · 2025-03-24T10:44:18Z

examples/modular-transformers/modeling_dummy.py

    _supports_flash_attn_2 = True
+    _supports_flash_attn_3 = True


supporting 2 or 3 is equivalent to the model here so we can just keep 2 <=> 3?

ArthurZucker · 2025-05-16T09:38:25Z

@hlky sorry I probably forgot to merge 😓 don't worry we'll push trough and add support!

hlky · 2025-05-16T09:50:04Z

@ArthurZucker lol it's cool, the PR wasn't finished because I had been waiting for your response, didn't have time in Paris then I was fired so I closed it 🤷‍♂️

Implements fwd and tests for Flash Attention 3 https://github.com/Dao-AILab/flash-attention/commits/main/hopper - Includes checks for dropout>0 and ALiBi in `modeling_utils.PreTrainedModel._check_and_enable_flash_attn_3` (Dropout will likely be supported soon, so this will need to be updated and `modeling_flash_attention_utils._flash_attention_forward` at the `if _IS_FLASH_ATTN_3_AVAILABLE: ...` An example Llama implementation is included in `modeling_llama.py` but other models would still need to be updated Based on huggingface#36190 which has model implementations and examples which could be merged

* Support `flash_attn_3` Implements fwd and tests for Flash Attention 3 https://github.com/Dao-AILab/flash-attention/commits/main/hopper - Includes checks for dropout>0 and ALiBi in `modeling_utils.PreTrainedModel._check_and_enable_flash_attn_3` (Dropout will likely be supported soon, so this will need to be updated and `modeling_flash_attention_utils._flash_attention_forward` at the `if _IS_FLASH_ATTN_3_AVAILABLE: ...` An example Llama implementation is included in `modeling_llama.py` but other models would still need to be updated Based on #36190 which has model implementations and examples which could be merged * Add tests for Flash Attention 2 and 3 parity * ci fix * FA2 compatibiity - `_prepare_flash_attention_from_position_ids` ->`prepare_fa2_from_position_ids` - Remove bettertransformer check in Flash Attention 3 - Merge tests - Add licensing * ci fix * Test naming consistency * ci fix * Deprecation warning for `prepare_fa2_from_position_ids` * ci fix

hlky added 6 commits February 14, 2025 06:53

_supports_flash_attn_3

efa7189

modeling_utils/import_utils

b1fc52e

config._attn_implementation/_use_flash_attention_3

9a80143

testing_utils

a526189

make

a9717e7

sliding_window

1b5f20c

vasqu reviewed Feb 14, 2025

View reviewed changes

hlky added 3 commits February 18, 2025 09:47

Merge branch 'main' into fav3

aeb1d55

Update modeling_granitemoe.py

ea85044

Update modeling_granitemoe.py

af0d015

Merge branch 'main' into fav3

823386a

Merge remote-tracking branch 'upstream/main' into fav3

c3aea43

ArthurZucker approved these changes Mar 24, 2025

View reviewed changes

hlky closed this Apr 15, 2025

hlky deleted the fav3 branch April 15, 2025 12:28

EduardDurech mentioned this pull request Jun 23, 2025

Support for Flash Attention 3 #38972

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Flash Attention v3 #36190

Flash Attention v3 #36190

Uh oh!

hlky commented Feb 14, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Feb 14, 2025

Uh oh!

vasqu left a comment •

edited

Loading

Uh oh!

vasqu Feb 14, 2025

Uh oh!

vasqu commented Feb 14, 2025

Uh oh!

bn999 commented Feb 14, 2025

Uh oh!

hlky commented Feb 18, 2025

Uh oh!

hlky commented Feb 24, 2025

Uh oh!

jianguoz commented Mar 14, 2025 •

edited

Loading

Uh oh!

sam-h-bean commented Mar 21, 2025

Uh oh!

hlky commented Mar 21, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

ArthurZucker Mar 24, 2025

Uh oh!

ArthurZucker commented May 16, 2025

Uh oh!

hlky commented May 16, 2025

Uh oh!

Uh oh!

Flash Attention v3 #36190

Flash Attention v3 #36190

Uh oh!

Conversation

hlky commented Feb 14, 2025

What does this PR do?

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Feb 14, 2025

Uh oh!

vasqu left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vasqu Feb 14, 2025

Choose a reason for hiding this comment

Uh oh!

vasqu commented Feb 14, 2025

Uh oh!

bn999 commented Feb 14, 2025

Uh oh!

hlky commented Feb 18, 2025

Uh oh!

hlky commented Feb 24, 2025

Uh oh!

jianguoz commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sam-h-bean commented Mar 21, 2025

Uh oh!

hlky commented Mar 21, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

ArthurZucker commented May 16, 2025

Uh oh!

hlky commented May 16, 2025

Uh oh!

Uh oh!

vasqu left a comment •

edited

Loading

jianguoz commented Mar 14, 2025 •

edited

Loading