-
Notifications
You must be signed in to change notification settings - Fork 10.1k
Enable PyTorch attention in VAE for AMD RDNA 4 #9956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Enable PyTorch attention in VAE for AMD RDNA 4 #9956
Conversation
Potential duplicate of #8289 |
Could be, but this only affects RDNA4. Based on my testing, it benefits PyTorch attention. Can't speak about the other archs, though. |
@0xDELUXA Comfyui uses automatically pytorch attention and so far i had no issues. But perhaps I haven't come across your specific workflow yet: |
@A-Temur Without this PR I had: After merging this PR: As you can see, I'm using a TheRock wheel for Windows. It's in a nightly state and not yet available on pytorch.org. Also, does your console say: |
@0xDELUXA Prior to that i was using ROCM 6.4.3 with pytorch 2.5.1 within a docker setup (on Fedora) and i didn't get the message about pytorch attention, but the performance was very poor (very long loading times especially before starting Ksampler and vae decode). Since im using ROCM 7 with pytorch 2.8 on ubuntu (no docker) the performance increase is huge and i had no issues so far. I would highly recommend using the recommended Ubuntu/RHEL installation, since ROCM+Radeon didn't do well in my experience on Windows + WSL or other non-supported Linux distros. On what specific models/workflows do you get the mentioned crash (without this PR)? |
I'm not talking about Windows + WSL. It's native PyTorch on Windows |
@0xDELUXA |
This specific PR doesn't do anything about the VAE crashes. It just makes AMD use the better attention optimization in VAE. I think you'll not get crashes on Linux at all. This is a Windows thing AFAIK. |
https://github.com/comfyanonymous/ComfyUI/blob/master/comfy%2Fmodel_management.py#L1116 I can share my workflow later, when I'll be online |
I've tested now the performance with pytorch attention enabled (forcefully) and without (vanilla comfy) on a simple Image Gen SDXL model: pytorch attention disabled (default): pytorch attention enabled: |
Might be partially related, but are you guys also making sure to use Also, I think MIOpen might be playing a part in the VAE encode/decode issues. If using |
How did you enable it forcefully? By merging this PR locally, or how? I dont think we have a flag specifically for enabling pytorch attention in vae. You said:
When ComfyUI starts, it doens't print anything about VAE. it prints: |
Yes, without the If this is a MIOpen thing, then I'm thinking of filing an issue there. My finding is that it isn’t the attention mechanism in the VAE that causes crashes. |
I'm pretty sure it is and I think there may already be some issues for it. For instance, here's one: ROCm/rocm-libraries#1571 |
Really good, so the devs know that there's something wrong. I'll try some runs with MIOpen verbose to get some logs, but I don't think I can debug them locally. This seems more like an internal thing |
You should be able to do some pretty spammy logging with it: |
Crashes occasionally occur with or without PyTorch attention (in VAE) in high-res.
But for normal image sizes, it's faster.
Tested on Windows (native PyTorch, not WSL) with a gfx1200, it uses PyTorch attention in VAE. RDNA 3 or earlier cards continue to use split attention as before.
Partially reverts “Disable PyTorch attention in VAE for AMD.” (commit 1cd6cd6) for RDNA4.