Skip to content

Misc. bug: Critical Crash and Performance Degradation in Vulkan Build since Release B6524 #16301

@Basten7

Description

@Basten7

Name and Version

Description

After upgrading to release b6524, the application experiences a significant slowdown in PP phase and sometime the run finish by a crash when running with Vulkan acceleration. The crash occurs due to vk::DeviceLostError during waitForFences, indicating a Vulkan device loss.

MacOS 12.6

Steps to Reproduce

  1. Run the following command (using AMD GPUs):
    ./build/bin/llama-bench -mg 4 -sm none -m ~/Models/llama-2-7b-q4_0.gguf -fa 0

Typical Log output for release before B6524

./build/bin/llama-bench -mg 1 -sm none -m ~/Models/llama-2-7b-q4_0.gguf
ggml_vulkan: Found 5 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6800 XT (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
ggml_vulkan: 1 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
ggml_vulkan: 2 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
ggml_vulkan: 3 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
ggml_vulkan: 4 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none

model size params backend threads main_gpu sm test t/s  
llama 7B Q4_0 3.56 GiB 6.74 B Vulkan,BLAS 12 1 none pp512 431.45 ± 0.15  
llama 7B Q4_0 3.56 GiB 6.74 B Vulkan,BLAS 12 1 none tg128 83.08 ± 2.65  

build: 9ebfcceb (6140)

Typical Log output for release after B6524

./build/bin/llama-bench -mg 1 -sm none -m ~/Models/llama-2-7b-q4_0.gguf -fa 0
ggml_vulkan: Found 5 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6800 XT (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
ggml_vulkan: 1 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
ggml_vulkan: 2 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
ggml_vulkan: 3 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
ggml_vulkan: 4 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none

model size params backend threads main_gpu sm fa test t/s
llama 7B Q4_0 3.56 GiB 6.74 B Vulkan,BLAS 12 0 none 1 pp512 12.78 ± 0.01
llama 7B Q4_0 3.56 GiB 6.74 B Vulkan,BLAS 12 0 none 1 tg128 84.70 ± 0.59

build: b995a10 (6593)

./build/bin/llama-bench -mg 1 -sm none -m ~/Models/llama-2-7b-q4_0.gguf -fa 1
ggml_vulkan: Found 5 Vulkan devices:
ggml_vulkan: 0 = AMD Radeon RX 6800 XT (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
ggml_vulkan: 1 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
ggml_vulkan: 2 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
ggml_vulkan: 3 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
ggml_vulkan: 4 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none model size params backend threads main_gpu sm fa test t/s llama 7B Q4_0 3.56 GiB 6.74 B Vulkan,BLAS 12 1 none 1 pp512 12.78 ± 0.01 llama 7B Q4_0 3.56 GiB 6.74 B Vulkan,BLAS 12 1 none 1 tg128 84.70 ± 0.59 build: [b995a10](https://github.com/ggml-org/llama.cpp/commit/b995a10760cb93d23d617d76ecb82a5f95b5e0d3) (6593)

typical log output for crash:
(lldb) process attach --pid 38556
error: attach failed: attach failed (Not allowed to attach to process. Look in the console messages (Console.app), near the debugserver entries, when the attach failed. The subsystem that denied the attach permission will likely have logged an informative message about why it was denied.)
libc++abi: terminating due to uncaught exception of type vk::DeviceLostError: vk::Device::waitForFences: ErrorDeviceLost
zsh: abort ./build/bin/llama-bench -mg 4 -sm none -m -fa 0

Operating systems

No response

Which llama.cpp modules do you know to be affected?

No response

Command line

/build/bin/llama-bench -mg 1 -sm none -m ~/Models/llama-2-7b-q4_0.gguf -fa 0

Problem description & steps to reproduce

crash or severe perf degradation in pp 420 t/s to 12 t/s

First Bad Commit

b6524 and the the followings commit:
1384abf
e6d65fb
8656f5d
0499b29
3f81b4e
f2a789e
f505bd8
0889589
3ecb2f6
96fdca0
ec65fb5
a20d810
9073a73
28baac9
1eeb523
5bb4a3e
commits_to_test.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions