Skip to content

Conversation

0xDELUXA
Copy link
Contributor

@0xDELUXA 0xDELUXA commented Sep 19, 2025

Crashes occasionally occur with or without PyTorch attention (in VAE) in high-res.
But for normal image sizes, it's faster.

Tested on Windows (native PyTorch, not WSL) with a gfx1200, it uses PyTorch attention in VAE. RDNA 3 or earlier cards continue to use split attention as before.

Partially reverts “Disable PyTorch attention in VAE for AMD.” (commit 1cd6cd6) for RDNA4.

@0xDELUXA 0xDELUXA changed the title Enable pytorch attention in VAE for AMD RDNA 4 Enable PyTorch attention in VAE for AMD RDNA 4 Sep 19, 2025
@qawery-just-sad
Copy link

Potential duplicate of #8289

@0xDELUXA
Copy link
Contributor Author

Potential duplicate of #8289

Could be, but this only affects RDNA4. Based on my testing, it benefits PyTorch attention. Can't speak about the other archs, though.

@A-Temur
Copy link

A-Temur commented Sep 22, 2025

@0xDELUXA
which pytorch/rocm version are you currently using?
My current setup:
Ubunutu 24.04.03
ROCM 7.0.0
torch 2.8.0 (+vision+triton...)
GPU: 7900 GRE (RDNA 3)

Comfyui uses automatically pytorch attention and so far i had no issues.

But perhaps I haven't come across your specific workflow yet:
If you don't mind sharing your workflow, I could test it on my device and give you some feedback.

@0xDELUXA
Copy link
Contributor Author

0xDELUXA commented Sep 22, 2025

@A-Temur
Platform: Windows
Python version: 3.11.9
PyTorch version: 2.10.0a0+rocm7.0.0rc20250918
AMD arch: gfx1200
ROCm version: (7, 1)

Without this PR I had:
Using pytorch attention (because I’m using the --use-pytorch-cross-attention flag)
BUT
Using split attention in VAE
automatically.

After merging this PR:
Using pytorch attention in VAE

As you can see, I'm using a TheRock wheel for Windows. It's in a nightly state and not yet available on pytorch.org.
I think we can't really compare Linux and Windows performance in this case.

Also, does your console say: Using pytorch attention in VAE?

@A-Temur
Copy link

A-Temur commented Sep 22, 2025

@0xDELUXA
now i see it:
my console also prints out "Using pytorch attention in VAE", but only when i start comfy. After that i also get only "Using split attention in VAE".

Prior to that i was using ROCM 6.4.3 with pytorch 2.5.1 within a docker setup (on Fedora) and i didn't get the message about pytorch attention, but the performance was very poor (very long loading times especially before starting Ksampler and vae decode).

Since im using ROCM 7 with pytorch 2.8 on ubuntu (no docker) the performance increase is huge and i had no issues so far. I would highly recommend using the recommended Ubuntu/RHEL installation, since ROCM+Radeon didn't do well in my experience on Windows + WSL or other non-supported Linux distros.

On what specific models/workflows do you get the mentioned crash (without this PR)?
I'm curious to check if it's the same for me.

@0xDELUXA
Copy link
Contributor Author

0xDELUXA commented Sep 22, 2025

I'm not talking about Windows + WSL. It's native PyTorch on Windows

@A-Temur
Copy link

A-Temur commented Sep 22, 2025

@0xDELUXA
Ok but im still asking now the third time to share you workflow/model so others (including me) can check whether the crash only happens on your specific setup.

@0xDELUXA
Copy link
Contributor Author

On what specific models/workflows do you get the mentioned crash (without this PR)?
I'm curious to check if it's the same for me.

This specific PR doesn't do anything about the VAE crashes. It just makes AMD use the better attention optimization in VAE. I think you'll not get crashes on Linux at all. This is a Windows thing AFAIK.

@0xDELUXA
Copy link
Contributor Author

0xDELUXA commented Sep 22, 2025

@0xDELUXA
Ok but im still asking now the third time to share you workflow/model so others (including me) can check whether the crash only happens on your specific setup.

https://github.com/comfyanonymous/ComfyUI/blob/master/comfy%2Fmodel_management.py#L1116

I can share my workflow later, when I'll be online

@A-Temur
Copy link

A-Temur commented Sep 22, 2025

I've tested now the performance with pytorch attention enabled (forcefully) and without (vanilla comfy) on a simple Image Gen SDXL model:

pytorch attention disabled (default):
First Generation: 18.22 seconds
subsequent Generations:
average 14 seconds

pytorch attention enabled:
first Generation: 25.37 seconds
subsequent Generations:
average 15.4 seconds

@RandomGitUser321
Copy link
Contributor

RandomGitUser321 commented Sep 22, 2025

Might be partially related, but are you guys also making sure to use set TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 before running main.py? You have to run that command with the newer TheRock Windows wheels and I think they mention it somewhere in the million PRs or issues, otherwise, I don't think it will --use-pytorch-cross-attention correctly(it won't use AOTriton if that isn't set beforehand). Or at least it's what I have to do with this gfx110X-dgpu, using Windows 11 24H2.

Also, I think MIOpen might be playing a part in the VAE encode/decode issues. If using set MIOPEN_FIND_MODE=FAST, it will work quickly, but probably doesn't have all the memory savings. If it's set to the default, it will take ages(whole minutes) on the first time you give it a new combination of resolutions, but then subsequent runs will be quick.

@0xDELUXA
Copy link
Contributor Author

0xDELUXA commented Sep 22, 2025

I've tested now the performance with pytorch attention enabled (forcefully) and without (vanilla comfy) on a simple Image Gen SDXL model:

pytorch attention disabled (default):
First Generation: 18.22 seconds
subsequent Generations:
average 14 seconds

pytorch attention enabled:
first Generation: 25.37 seconds
subsequent Generations:
average 15.4 seconds

How did you enable it forcefully? By merging this PR locally, or how? I dont think we have a flag specifically for enabling pytorch attention in vae.
Pytorch attention as a whole is one thing, and enabling it in vae is another.

You said:

my console also prints out "Using pytorch attention in VAE", but only when i start comfy. After that i also get only "Using split attention in VAE".

When ComfyUI starts, it doens't print anything about VAE. it prints:
Using pytorch attention
Then when we start generating an image, it prints:
Using split attention in VAE - without this PR, and
Using pytorch attention in VAE - with this PR

@0xDELUXA
Copy link
Contributor Author

0xDELUXA commented Sep 22, 2025

@RandomGitUser321

Yes, without the set TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 env var, it shows a warning, and cant use flash or mem-eff attention at all.

If this is a MIOpen thing, then I'm thinking of filing an issue there.

My finding is that it isn’t the attention mechanism in the VAE that causes crashes.

@RandomGitUser321
Copy link
Contributor

RandomGitUser321 commented Sep 22, 2025

If this is a MIOpen thing, then I'm thinking of filing an issue there.

I'm pretty sure it is and I think there may already be some issues for it. For instance, here's one: ROCm/rocm-libraries#1571
It's not 100% directly related to this issue here, but it at least shows there are issues between MIOpen and VAE encode/decoding.

@0xDELUXA
Copy link
Contributor Author

0xDELUXA commented Sep 22, 2025

I'm pretty sure it is and I think there may already be some issues for it. For instance, here's one: ROCm/rocm-libraries#1571
It's not 100% directly related to this issue here, but it at least shows there are issues between MIOpen and VAE encode/decoding.

Really good, so the devs know that there's something wrong. I'll try some runs with MIOpen verbose to get some logs, but I don't think I can debug them locally. This seems more like an internal thing

@RandomGitUser321
Copy link
Contributor

You should be able to do some pretty spammy logging with it:
https://rocm.docs.amd.com/projects/MIOpen/en/latest/how-to/debug-log.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants