-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Description
Name and Version
Description
After upgrading to release b6524, the application experiences a significant slowdown in PP phase and sometime the run finish by a crash when running with Vulkan acceleration. The crash occurs due to vk::DeviceLostError
during waitForFences
, indicating a Vulkan device loss.
MacOS 12.6
Steps to Reproduce
- Run the following command (using AMD GPUs):
./build/bin/llama-bench -mg 4 -sm none -m ~/Models/llama-2-7b-q4_0.gguf -fa 0
Typical Log output for release before B6524
./build/bin/llama-bench -mg 1 -sm none -m ~/Models/llama-2-7b-q4_0.gguf
ggml_vulkan: Found 5 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6800 XT (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
ggml_vulkan: 1 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
ggml_vulkan: 2 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
ggml_vulkan: 3 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
ggml_vulkan: 4 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
model | size | params | backend | threads | main_gpu | sm | test | t/s | |
---|---|---|---|---|---|---|---|---|---|
llama 7B Q4_0 | 3.56 GiB | 6.74 B | Vulkan,BLAS | 12 | 1 | none | pp512 | 431.45 ± 0.15 | |
llama 7B Q4_0 | 3.56 GiB | 6.74 B | Vulkan,BLAS | 12 | 1 | none | tg128 | 83.08 ± 2.65 |
build: 9ebfcceb (6140)
Typical Log output for release after B6524
./build/bin/llama-bench -mg 1 -sm none -m ~/Models/llama-2-7b-q4_0.gguf -fa 0
ggml_vulkan: Found 5 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6800 XT (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
ggml_vulkan: 1 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
ggml_vulkan: 2 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
ggml_vulkan: 3 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
ggml_vulkan: 4 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
model | size | params | backend | threads | main_gpu | sm | fa | test | t/s |
---|---|---|---|---|---|---|---|---|---|
llama 7B Q4_0 | 3.56 GiB | 6.74 B | Vulkan,BLAS | 12 | 0 | none | 1 | pp512 | 12.78 ± 0.01 |
llama 7B Q4_0 | 3.56 GiB | 6.74 B | Vulkan,BLAS | 12 | 0 | none | 1 | tg128 | 84.70 ± 0.59 |
build: b995a10 (6593)
./build/bin/llama-bench -mg 1 -sm none -m ~/Models/llama-2-7b-q4_0.gguf -fa 1 ggml_vulkan: Found 5 Vulkan devices: ggml_vulkan: 0 = AMD Radeon RX 6800 XT (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none ggml_vulkan: 1 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none ggml_vulkan: 2 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none ggml_vulkan: 3 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none ggml_vulkan: 4 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none model size params backend threads main_gpu sm fa test t/s llama 7B Q4_0 3.56 GiB 6.74 B Vulkan,BLAS 12 1 none 1 pp512 12.78 ± 0.01 llama 7B Q4_0 3.56 GiB 6.74 B Vulkan,BLAS 12 1 none 1 tg128 84.70 ± 0.59 build: [b995a10](https://github.com/ggml-org/llama.cpp/commit/b995a10760cb93d23d617d76ecb82a5f95b5e0d3) (6593)typical log output for crash:
(lldb) process attach --pid 38556
error: attach failed: attach failed (Not allowed to attach to process. Look in the console messages (Console.app), near the debugserver entries, when the attach failed. The subsystem that denied the attach permission will likely have logged an informative message about why it was denied.)
libc++abi: terminating due to uncaught exception of type vk::DeviceLostError: vk::Device::waitForFences: ErrorDeviceLost
zsh: abort ./build/bin/llama-bench -mg 4 -sm none -m -fa 0
Operating systems
No response
Which llama.cpp modules do you know to be affected?
No response
Command line
/build/bin/llama-bench -mg 1 -sm none -m ~/Models/llama-2-7b-q4_0.gguf -fa 0
Problem description & steps to reproduce
crash or severe perf degradation in pp 420 t/s to 12 t/s
First Bad Commit
b6524 and the the followings commit:
1384abf
e6d65fb
8656f5d
0499b29
3f81b4e
f2a789e
f505bd8
0889589
3ecb2f6
96fdca0
ec65fb5
a20d810
9073a73
28baac9
1eeb523
5bb4a3e
commits_to_test.txt