Ollama tries to mount models in RAM instead of VRAM

I am getting the **Error 500: model requires more system memory (5.6 GiB) than is available (4.5 GiB).**

Both Ollama and Open WebUI have no issues starting up. I managed to download deepseek-r1 using the UI. The error appears when I try to chat with it.

**Ollama Logs**
This shows that Intel GPU is detected by Ollama upon the container startup.

```
2025-09-23 02:44:40.375869+00:00time=2025-09-23T10:44:40.375+08:00 level=INFO source=routes.go:1288 msg="Listening on [::]:11434 (version 0.9.3)"
2025-09-23 02:44:40.376969+00:00time=2025-09-23T10:44:40.376+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
2025-09-23 02:44:40.376987+00:00time=2025-09-23T10:44:40.376+08:00 level=INFO source=gpu.go:218 msg="using Intel GPU"
2025-09-23 02:44:40.520863+00:00time=2025-09-23T10:44:40.520+08:00 level=WARN source=amd_linux.go:61 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers" error="amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
2025-09-23 02:44:40.521490+00:00time=2025-09-23T10:44:40.521+08:00 level=INFO source=amd_linux.go:296 msg="unsupported Radeon iGPU detected skipping" id=0 total="512.0 MiB"
2025-09-23 02:44:40.521512+00:00time=2025-09-23T10:44:40.521+08:00 level=INFO source=amd_linux.go:402 msg="no compatible amdgpu devices detected"
2025-09-23 02:44:40.521527+00:00time=2025-09-23T10:44:40.521+08:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="30.7 GiB" available="3.6 GiB"
```

**Docker-Compose configuration to reproduce**
```
services:
  ollama-intel-arc:
    image: intelanalytics/ipex-llm-inference-cpp-xpu:latest
    container_name: ollama-intel-arc
    restart: unless-stopped
    devices:
      - /dev/dri:/dev/dri
    volumes:
      - ollama-volume:/root/.ollama
    ports:
      - 11434:11434
    environment:
      - no_proxy=localhost,127.0.0.1
      - OLLAMA_HOST=0.0.0.0
      - DEVICE=Arc
      - OLLAMA_INTEL_GPU=true
      - OLLAMA_NUM_GPU=999
      - ZES_ENABLE_SYSMAN=1
    command: sh -c 'mkdir -p /llm/ollama && cd /llm/ollama && init-ollama && exec ./ollama serve'

  open-webui:
    image: ghcr.io/open-webui/open-webui:latest
    container_name: open-webui
    volumes:
      - open-webui-volume:/app/backend/data
    depends_on:
      - ollama-intel-arc
    ports:
      - 4040:8080
    environment:
      - WEBUI_AUTH=False
      - ENABLE_OPENAI_API=False
      - ENABLE_OLLAMA_API=True
      - ENABLE_IMAGE_GENERATION=True
      - IMAGE_GENERATION_ENGINE=automatic1111
      - IMAGE_GENERATION_MODEL=dreamshaper_8
      - IMAGE_SIZE=400x400
      - IMAGE_STEPS=8
      - AUTOMATIC1111_BASE_URL=http://sdnext-ipex:7860/
      - AUTOMATIC1111_CFG_SCALE=2
      - AUTOMATIC1111_SAMPLER=DPM++ SDE
      - AUTOMATIC1111_SCHEDULER=Karras
    extra_hosts:
      - host.docker.internal:host-gateway
    restart: unless-stopped

volumes:
  ollama-volume: {}
  open-webui-volume: {}
```

**Environment information** (Ran the script from within the Ollama container)
```
OS Version: TrueNAS-SCALE-24.10.2.4
-----------------------------------------------------------------
PYTHON_VERSION=3.11.13
-----------------------------------------------------------------
/usr/local/lib/python3.11/dist-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
transformers=4.36.2
-----------------------------------------------------------------
torch=2.2.0+cu121
-----------------------------------------------------------------
ipex-llm Version: 2.3.0b20250826
-----------------------------------------------------------------
IPEX is not installed. 
-----------------------------------------------------------------
CPU Information: 
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        48 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               12
On-line CPU(s) list:                  0-11
Vendor ID:                            AuthenticAMD
Model name:                           AMD Ryzen 5 5600G with Radeon Graphics
CPU family:                           25
Model:                                80
Thread(s) per core:                   2
Core(s) per socket:                   6
Socket(s):                            1
Stepping:                             0
CPU max MHz:                          4464.0000
CPU min MHz:                          400.0000
BogoMIPS:                             7799.90
-----------------------------------------------------------------
Total CPU Memory: 30.7229 GB
Memory Type: sudo: dmidecode: command not found
-----------------------------------------------------------------
Operating System: 
Ubuntu 22.04.5 LTS \n \l

-----------------------------------------------------------------
Linux 19aedd4757ba 6.6.44-production+truenas #1 SMP PREEMPT_DYNAMIC Wed Aug  6 20:07:31 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
-----------------------------------------------------------------
./env-check.sh: line 148: xpu-smi: command not found
-----------------------------------------------------------------
./env-check.sh: line 154: clinfo: command not found
-----------------------------------------------------------------
Driver related package version:
ii  intel-level-zero-gpu                             1.6.32224.5                             amd64        Intel(R) Graphics Compute Runtime for oneAPI Level Zero.
ii  intel-level-zero-gpu-legacy1                     1.3.30872.22                            amd64        Intel(R) Graphics Compute Runtime for oneAPI Level Zero.
ii  level-zero-devel                                 1.20.2                                  amd64        oneAPI Level Zero
-----------------------------------------------------------------
igpu not detected
-----------------------------------------------------------------
xpu-smi is not installed. Please install xpu-smi according to README.md
```

**Additional context**
I have Intel Arc B580 (12GB) and Intel Arc Pro B50 (16GB) installed in the system at the same time. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ollama tries to mount models in RAM instead of VRAM #13312

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ollama tries to mount models in RAM instead of VRAM #13312

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions