Misc. bug: Critical Crash and Performance Degradation in Vulkan Build since Release B6524

### Name and Version

### Description
After upgrading to **release b6524**, the application experiences a significant slowdown in PP phase and sometime the run finish by a **crash** when running with Vulkan acceleration. The crash occurs due to `vk::DeviceLostError` during `waitForFences`, indicating a Vulkan device loss.

MacOS 12.6

### Steps to Reproduce
1. Run the following command (using AMD GPUs):
 ./build/bin/llama-bench -mg 4 -sm none -m ~/Models/llama-2-7b-q4_0.gguf -fa 0

Typical Log output for release before B6524
./build/bin/llama-bench -mg 1 -sm none -m ~/Models/llama-2-7b-q4_0.gguf 
ggml_vulkan: Found 5 Vulkan devices: 
ggml_vulkan: 0 = AMD Radeon RX 6800 XT (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none 
ggml_vulkan: 1 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none 
ggml_vulkan: 2 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none 
ggml_vulkan: 3 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none 
ggml_vulkan: 4 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none

model | size | params | backend | threads | main_gpu | sm | test | t/s |  
-- | -- | -- | -- | -- | -- | -- | -- | -- | --
llama 7B Q4_0 | 3.56 GiB | 6.74 B | Vulkan,BLAS | 12 | 1 | none | pp512 | 431.45 ± 0.15 |  
llama 7B Q4_0 | 3.56 GiB | 6.74 B | Vulkan,BLAS | 12 | 1 | none | tg128 | 83.08 ± 2.65 |  


build: 9ebfcceb (6140)

Typical Log output for release after B6524
./build/bin/llama-bench -mg 1 -sm none -m ~/Models/llama-2-7b-q4_0.gguf -fa 0 
ggml_vulkan: Found 5 Vulkan devices: 
ggml_vulkan: 0 = AMD Radeon RX 6800 XT (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none 
ggml_vulkan: 1 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none 
ggml_vulkan: 2 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none 
ggml_vulkan: 3 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none 
ggml_vulkan: 4 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none

model | size | params | backend | threads | main_gpu | sm | fa | test | t/s
-- | -- | -- | -- | -- | -- | -- | -- | -- | --
llama 7B Q4_0 | 3.56 GiB | 6.74 B | Vulkan,BLAS | 12 | 0| none | 1 | pp512 | 12.78 ± 0.01
llama 7B Q4_0 | 3.56 GiB | 6.74 B | Vulkan,BLAS | 12 | 0| none | 1 | tg128 | 84.70 ± 0.59


build: <a href="https://github.com/ggml-org/llama.cpp/commit/b995a10760cb93d23d617d76ecb82a5f95b5e0d3">b995a10</a> (6593)./build/bin/llama-bench -mg 1 -sm none -m ~/Models/llama-2-7b-q4_0.gguf -fa 1 ggml_vulkan: Found 5 Vulkan devices: ggml_vulkan: 0 = AMD Radeon RX 6800 XT (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none ggml_vulkan: 1 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none ggml_vulkan: 2 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none ggml_vulkan: 3 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none ggml_vulkan: 4 = AMD Radeon PRO W6800X Duo (MoltenVK) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 0 | matrix cores: none
model	size	params	backend	threads	main_gpu	sm	fa	test	t/s
llama 7B Q4_0	3.56 GiB	6.74 B	Vulkan,BLAS	12	1	none	1	pp512	12.78 ± 0.01
llama 7B Q4_0	3.56 GiB	6.74 B	Vulkan,BLAS	12	1	none	1	tg128	84.70 ± 0.59
build: [b995a10](https://github.com/ggml-org/llama.cpp/commit/b995a10760cb93d23d617d76ecb82a5f95b5e0d3) (6593)


typical log output for crash:
(lldb) process attach --pid 38556 error: attach failed: attach failed (Not allowed to attach to process. Look in the console messages (Console.app), near the debugserver entries, when the attach failed. The subsystem that denied the attach permission will likely have logged an informative message about why it was denied.) libc++abi: terminating due to uncaught exception of type vk::DeviceLostError: vk::Device::waitForFences: ErrorDeviceLost zsh: abort ./build/bin/llama-bench -mg 4 -sm none -m -fa 0

### Operating systems

_No response_

### Which llama.cpp modules do you know to be affected?

_No response_

### Command line

```shell
/build/bin/llama-bench -mg 1 -sm none -m ~/Models/llama-2-7b-q4_0.gguf -fa 0
```

### Problem description & steps to reproduce

crash or severe perf degradation in pp 420 t/s to 12 t/s

### First Bad Commit

b6524 and the the followings commit:
1384abf8b8d5894d32fada453ccf4d196ffba7de
e6d65fb02d553bd79cad94e517cdca18b687788d
8656f5de688cddcaea1d6174535eb60ee23ef6a0
0499b29c6f64c705faaf5860dc4600fca23671f4
3f81b4e91c1d5f098148af117e3f13cf4b077f52
f2a789e33490deb483a2694b066b37e45524bb79
f505bd83ca7a43c4585ff3d59135e77eae9c793b
0889589dbe8c67dd518c58a57afbb02dde2dccbe
3ecb2f671a2f49d56357f99d135a94e841759178
96fdca043be8fa733bbd59868a75517f92633376
ec65fb52f0cc890e5085a6c5995ab08a156265fb
a20d810d79adfbf82e0eb703af6f27a0e2d1a539
9073a73d82a916cea0809de225ef5175c3a86e91
28baac9c9f491c872e2c37762d3bd90446b005e9
1eeb523c3e0c7ffbd59469f5463dcbdecba3535e
5bb4a3edec297e74b0f7bd4ed5d0fdd12e28d858
[commits_to_test.txt](https://github.com/user-attachments/files/22582137/commits_to_test.txt)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Misc. bug: Critical Crash and Performance Degradation in Vulkan Build since Release B6524 #16301

Name and Version

Description

Steps to Reproduce

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

model	size	params	backend	threads	main_gpu	sm	test	t/s
llama 7B Q4_0	3.56 GiB	6.74 B	Vulkan,BLAS	12	1	none	pp512	431.45 ± 0.15
llama 7B Q4_0	3.56 GiB	6.74 B	Vulkan,BLAS	12	1	none	tg128	83.08 ± 2.65

Misc. bug: Critical Crash and Performance Degradation in Vulkan Build since Release B6524 #16301

Description

Name and Version

Description

Steps to Reproduce

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions