Crash due to unhandled exception from ggml_vk_allocate in llama_kv_cache_init

### System Info

GPT4All version : 2.6.1

1. OS, kernel and Python 

```bash
karthik@fedora:~$ cat /etc/fedora-release | cut -c -17 && uname -sr && python --version && python3.10 --version
Fedora release 39
Linux 6.6.13-200.fc39.x86_64
Python 3.12.1
Python 3.10.13
```

2. Memory info

```bash
karthik@fedora:~$ free
               total        used        free      shared  buff/cache   available
Mem:        15633228     2679428     4617172       38508     8723496    12953800
Swap:       52236280      683776    51552504
```
```bash
karthik@fedora:~$ swapon
NAME           TYPE       SIZE   USED PRIO
/dev/nvme0n1p6 partition   20G     0B    1
/dev/zram0     partition 29.8G 666.8M  100
```

3. GPU and CPU info 

```bash
karthik@fedora:~$ inxi
CPU: 8-core AMD Ryzen 7 7840HS w/ Radeon 780M Graphics (-MT MCP-)
speed/min/max: 709/400/5137:5293:5608:6080:5449:5764:5924 MHz
Kernel: 6.6.13-200.fc39.x86_64 x86_64 Up: 1h 21m Mem: 2.53/14.91 GiB (17.0%)
Storage: 1011.17 GiB (18.0% used) Procs: 413 Shell: Bash inxi: 3.3.31
karthik@fedora:~$ 
```

```bash
karthik@fedora:~$ nvidia-smi
Wed Jan 24 20:38:10 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.06              Driver Version: 545.29.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4060 ...    On  | 00000000:01:00.0 Off |                  N/A |
| N/A   42C    P0              19W / 105W |      7MiB /  8188MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
```

### Information

- [ ] The official example notebooks/scripts
- [ ] My own modified scripts

### Reproduction

1. Try to load Wizard 1.2 model 
2. Unable to allocate memory so crashes

```bash
karthik@fedora:~$ ./gpt4all/bin/chat 
[Warning] (Wed Jan 24 20:34:29 2024): Could not find the Qt platform plugin "wayland" in ""
[Warning] (Wed Jan 24 20:34:29 2024): Could not connect "org.freedesktop.IBus" to globalEngineChanged(QString)
[Debug] (Wed Jan 24 20:34:32 2024): deserializing chat "/home/karthik/.local/share/nomic.ai/GPT4All//gpt4all-5a42a64c-864a-458f-beec-508935f6c28e.chat"
[Debug] (Wed Jan 24 20:34:32 2024): deserializing chat "/home/karthik/.local/share/nomic.ai/GPT4All//gpt4all-5c19a817-c6ab-4891-8f53-b0d65e3c6eed.chat"
[Debug] (Wed Jan 24 20:34:32 2024): deserializing chats took: 3 ms
[Warning] (Wed Jan 24 20:34:32 2024): ERROR: Previous attempt to load model resulted in crash for `wizardlm-13b-v1.2.Q4_0.gguf` most likely due to insufficient memory. You should either remove this model or decrease your system RAM usage by closing other applications. id "1ef2661f-f65a-4cba-a2d6-31bc4d85b5a2"
Error allocating memory ErrorOutOfDeviceMemory
[Warning] (Wed Jan 24 20:34:43 2024): Qt has caught an exception thrown from an event handler. Throwing
exceptions from an event handler is not supported in Qt.
You must not let any exception whatsoever propagate through Qt code.
terminate called after throwing an instance of 'std::runtime_error'
  what():  Error allocating vulkan memory.
Aborted (core dumped)
```

### Expected behavior

1. Try to load Wizard 1.2 model 
2. Wizard 1.2 model loads and can interact with the chatbot on GPT4All GUI, **which does work on the same system on Windows when I tried**

> Smaller models does work on Linux with and without GPU and throws no such errors as above

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Crash due to unhandled exception from ggml_vk_allocate in llama_kv_cache_init #1870

System Info

Information

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Crash due to unhandled exception from ggml_vk_allocate in llama_kv_cache_init #1870

Description

System Info

Information

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions