Skip to content

Abnormalities in using Intel GPU SYCL on the Ubuntu system #13311

@ligjn

Description

@ligjn

My Device:
system: ubuntu 24.03
ipex-llm: ipex-llm-ollama

Image Image

Error performance:
An error occurs when executing ollama run qwen2.5:0.5b
Image

A more complete error log:
The current error message is displayed in the terminal command-line interface of ollama serve.

[GIN] 2025/09/15 - 19:29:14 | 500 |  442.334217ms |       127.0.0.1 | POST     "/api/generate"
^Croot@dcg:/opt/aog/engine/ollama/ollama# ./ollama serve
time=2025-09-15T19:29:45.094+08:00 level=INFO source=routes.go:1235 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:16677 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/var/lib/aog/engine/ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-09-15T19:29:45.094+08:00 level=INFO source=images.go:476 msg="total blobs: 7"
time=2025-09-15T19:29:45.094+08:00 level=INFO source=images.go:483 msg="total unused blobs removed: 0"
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:   export GIN_MODE=release
 - using code:  gin.SetMode(gin.ReleaseMode)

[GIN-debug] HEAD   /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
[GIN-debug] GET    /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] HEAD   /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers)
[GIN-debug] GET    /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func4 (5 handlers)
[GIN-debug] POST   /api/pull                 --> github.com/ollama/ollama/server.(*Server).PullHandler-fm (5 handlers)
[GIN-debug] POST   /api/push                 --> github.com/ollama/ollama/server.(*Server).PushHandler-fm (5 handlers)
[GIN-debug] HEAD   /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] GET    /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (5 handlers)
[GIN-debug] POST   /api/show                 --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (5 handlers)
[GIN-debug] DELETE /api/delete               --> github.com/ollama/ollama/server.(*Server).DeleteHandler-fm (5 handlers)
[GIN-debug] POST   /api/create               --> github.com/ollama/ollama/server.(*Server).CreateHandler-fm (5 handlers)
[GIN-debug] POST   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers)
[GIN-debug] HEAD   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers)
[GIN-debug] POST   /api/copy                 --> github.com/ollama/ollama/server.(*Server).CopyHandler-fm (5 handlers)
[GIN-debug] GET    /api/ps                   --> github.com/ollama/ollama/server.(*Server).PsHandler-fm (5 handlers)
[GIN-debug] POST   /api/generate             --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers)
[GIN-debug] POST   /api/chat                 --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers)
[GIN-debug] POST   /api/embed                --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (5 handlers)
[GIN-debug] POST   /api/embeddings           --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers)
[GIN-debug] POST   /v1/chat/completions      --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers)
[GIN-debug] POST   /v1/completions           --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers)
[GIN-debug] POST   /v1/embeddings            --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers)
[GIN-debug] GET    /v1/models                --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers)
[GIN-debug] GET    /v1/models/:model         --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers)
time=2025-09-15T19:29:45.094+08:00 level=INFO source=routes.go:1288 msg="Listening on [::]:16677 (version 0.9.3)"
time=2025-09-15T19:29:45.094+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-09-15T19:29:45.094+08:00 level=INFO source=gpu.go:218 msg="using Intel GPU"
time=2025-09-15T19:29:45.100+08:00 level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="30.9 GiB" available="25.8 GiB"
[GIN] 2025/09/15 - 19:29:48 | 200 |      27.735µs |       127.0.0.1 | HEAD     "/"
[GIN] 2025/09/15 - 19:29:48 | 200 |   28.881689ms |       127.0.0.1 | POST     "/api/show"
time=2025-09-15T19:29:48.779+08:00 level=INFO source=server.go:135 msg="system memory" total="30.9 GiB" free="25.8 GiB" free_swap="0 B"
time=2025-09-15T19:29:48.780+08:00 level=INFO source=server.go:187 msg=offload library=cpu layers.requested=-1 layers.model=25 layers.offload=0 layers.split="" memory.available="[25.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="732.6 MiB" memory.required.partial="0 B" memory.required.kv="48.0 MiB" memory.required.allocations="[732.6 MiB]" memory.weights.total="373.7 MiB" memory.weights.repeating="235.8 MiB" memory.weights.nonrepeating="137.9 MiB" memory.graph.full="298.5 MiB" memory.graph.partial="405.0 MiB"
llama_model_loader: loaded meta data with 34 key-value pairs and 290 tensors from /var/lib/aog/engine/ollama/models/blobs/sha256-c5396e06af294bd101b30dce59131a76d2b773e76950acc870eda801d3ab0515 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = qwen2
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Qwen2.5 0.5B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Qwen2.5
llama_model_loader: - kv   5:                         general.size_label str              = 0.5B
llama_model_loader: - kv   6:                            general.license str              = apache-2.0
llama_model_loader: - kv   7:                       general.license.link str              = https://huggingface.co/Qwen/Qwen2.5-0...
llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
llama_model_loader: - kv   9:                  general.base_model.0.name str              = Qwen2.5 0.5B
llama_model_loader: - kv  10:          general.base_model.0.organization str              = Qwen
llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/Qwen/Qwen2.5-0.5B
llama_model_loader: - kv  12:                               general.tags arr[str,2]       = ["chat", "text-generation"]
llama_model_loader: - kv  13:                          general.languages arr[str,1]       = ["en"]
llama_model_loader: - kv  14:                          qwen2.block_count u32              = 24
llama_model_loader: - kv  15:                       qwen2.context_length u32              = 32768
llama_model_loader: - kv  16:                     qwen2.embedding_length u32              = 896
llama_model_loader: - kv  17:                  qwen2.feed_forward_length u32              = 4864
llama_model_loader: - kv  18:                 qwen2.attention.head_count u32              = 14
llama_model_loader: - kv  19:              qwen2.attention.head_count_kv u32              = 2
llama_model_loader: - kv  20:                       qwen2.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  21:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  22:                          general.file_type u32              = 15
llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = qwen2
llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  26:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  27:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv  28:                tokenizer.ggml.eos_token_id u32              = 151645
llama_model_loader: - kv  29:            tokenizer.ggml.padding_token_id u32              = 151643
llama_model_loader: - kv  30:                tokenizer.ggml.bos_token_id u32              = 151643
llama_model_loader: - kv  31:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  32:                    tokenizer.chat_template str              = {%- if tools %}\n    {{- '<|im_start|>...
llama_model_loader: - kv  33:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  121 tensors
llama_model_loader: - type q5_0:  132 tensors
llama_model_loader: - type q8_0:   13 tensors
llama_model_loader: - type q4_K:   12 tensors
llama_model_loader: - type q6_K:   12 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q4_K - Medium
print_info: file size   = 373.71 MiB (6.35 BPW) 
load: special tokens cache size = 22
load: token to piece cache size = 0.9310 MB
print_info: arch             = qwen2
print_info: vocab_only       = 1
print_info: model type       = ?B
print_info: model params     = 494.03 M
print_info: general.name     = Qwen2.5 0.5B Instruct
print_info: vocab type       = BPE
print_info: n_vocab          = 151936
print_info: n_merges         = 151387
print_info: BOS token        = 151643 '<|endoftext|>'
print_info: EOS token        = 151645 '<|im_end|>'
print_info: EOT token        = 151645 '<|im_end|>'
print_info: PAD token        = 151643 '<|endoftext|>'
print_info: LF token         = 198 'Ċ'
print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
print_info: FIM MID token    = 151660 '<|fim_middle|>'
print_info: FIM PAD token    = 151662 '<|fim_pad|>'
print_info: FIM REP token    = 151663 '<|repo_name|>'
print_info: FIM SEP token    = 151664 '<|file_sep|>'
print_info: EOG token        = 151643 '<|endoftext|>'
print_info: EOG token        = 151645 '<|im_end|>'
print_info: EOG token        = 151662 '<|fim_pad|>'
print_info: EOG token        = 151663 '<|repo_name|>'
print_info: EOG token        = 151664 '<|file_sep|>'
print_info: max token length = 256
llama_model_load: vocab only - skipping tensors
time=2025-09-15T19:29:48.917+08:00 level=INFO source=server.go:458 msg="starting llama server" cmd="/opt/aog/engine/ollama/ollama/ollama-bin runner --model /var/lib/aog/engine/ollama/models/blobs/sha256-c5396e06af294bd101b30dce59131a76d2b773e76950acc870eda801d3ab0515 --ctx-size 4096 --batch-size 512 --n-gpu-layers 999 --threads 8 --no-mmap --parallel 2 --port 42509"
time=2025-09-15T19:29:48.917+08:00 level=INFO source=sched.go:483 msg="loaded runners" count=1
time=2025-09-15T19:29:48.917+08:00 level=INFO source=server.go:618 msg="waiting for llama runner to start responding"
time=2025-09-15T19:29:48.917+08:00 level=INFO source=server.go:652 msg="waiting for server to become available" status="llm server not responding"
using override patterns: []
time=2025-09-15T19:29:48.949+08:00 level=INFO source=runner.go:851 msg="starting go runner"
terminate called after throwing an instance of 'sycl::_V1::exception'
  what():  No device of requested type available. Please check https://software.intel.com/content/www/us/en/develop/articles/intel-oneapi-dpcpp-system-requirements.html
SIGABRT: abort
PC=0x7f96256a70fc m=0 sigcode=18446744073709551610
signal arrived during cgo execution

goroutine 1 gp=0xc000002380 m=0 mp=0x1ecf7c0 [syscall]:
runtime.cgocall(0x1157b90, 0xc000439538)
        /usr/local/go/src/runtime/cgocall.go:167 +0x4b fp=0xc000439510 sp=0xc0004394d8 pc=0x48398b
github.com/ollama/ollama/ml/backend/ggml/ggml/src._Cfunc_ggml_backend_load_all_from_path(0x26b6a910)
        _cgo_gotypes.go:195 +0x3a fp=0xc000439538 sp=0xc000439510 pc=0x830dfa
github.com/ollama/ollama/ml/backend/ggml/ggml/src.init.func1.1({0xc000042014, 0x1d})
        /home/arda/ruonan/ollama-internal/ml/backend/ggml/ggml/src/ggml.go:97 +0xf5 fp=0xc0004395d0 sp=0xc000439538 pc=0x830895
github.com/ollama/ollama/ml/backend/ggml/ggml/src.init.func1()
        /home/arda/ruonan/ollama-internal/ml/backend/ggml/ggml/src/ggml.go:98 +0x526 fp=0xc000439860 sp=0xc0004395d0 pc=0x8306e6
github.com/ollama/ollama/ml/backend/ggml/ggml/src.init.OnceFunc.func2()
        /usr/local/go/src/sync/oncefunc.go:27 +0x62 fp=0xc0004398a8 sp=0xc000439860 pc=0x8300e2
sync.(*Once).doSlow(0x0?, 0x0?)
        /usr/local/go/src/sync/once.go:78 +0xab fp=0xc000439900 sp=0xc0004398a8 pc=0x4991ab
sync.(*Once).Do(0x0?, 0x0?)
        /usr/local/go/src/sync/once.go:69 +0x19 fp=0xc000439920 sp=0xc000439900 pc=0x4990d9
github.com/ollama/ollama/ml/backend/ggml/ggml/src.init.OnceFunc.func3()
        /usr/local/go/src/sync/oncefunc.go:32 +0x2d fp=0xc000439950 sp=0xc000439920 pc=0x83004d
github.com/ollama/ollama/llama.BackendInit()
        /home/arda/ruonan/ollama-internal/llama/llama.go:57 +0x16 fp=0xc000439960 sp=0xc000439950 pc=0x8349f6
github.com/ollama/ollama/runner/llamarunner.Execute({0xc0001aa020, 0xf, 0x10})
        /home/arda/ruonan/ollama-internal/runner/llamarunner/runner.go:853 +0x7d4 fp=0xc000439d08 sp=0xc000439960 pc=0x8f3bf4
github.com/ollama/ollama/runner.Execute({0xc0001aa010?, 0x0?, 0x0?})
        /home/arda/ruonan/ollama-internal/runner/runner.go:22 +0xd4 fp=0xc000439d30 sp=0xc000439d08 pc=0x979374
github.com/ollama/ollama/cmd.NewCLI.func2(0xc000271400?, {0x140da22?, 0x4?, 0x140da26?})
        /home/arda/ruonan/ollama-internal/cmd/cmd.go:1529 +0x45 fp=0xc000439d58 sp=0xc000439d30 pc=0x10d5b45
github.com/spf13/cobra.(*Command).execute(0xc000114f08, {0xc0004b4ff0, 0xf, 0xf})
        /home/arda/go/pkg/mod/github.com/spf13/[email protected]/command.go:940 +0x894 fp=0xc000439e78 sp=0xc000439d58 pc=0x5ff694
github.com/spf13/cobra.(*Command).ExecuteC(0xc0000e6908)
        /home/arda/go/pkg/mod/github.com/spf13/[email protected]/command.go:1068 +0x3a5 fp=0xc000439f30 sp=0xc000439e78 pc=0x5ffee5
github.com/spf13/cobra.(*Command).Execute(...)
        /home/arda/go/pkg/mod/github.com/spf13/[email protected]/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
        /home/arda/go/pkg/mod/github.com/spf13/[email protected]/command.go:985
main.main()
        /home/arda/ruonan/ollama-internal/main.go:12 +0x4d fp=0xc000439f50 sp=0xc000439f30 pc=0x10d65cd
runtime.main()
        /usr/local/go/src/runtime/proc.go:283 +0x28b fp=0xc000439fe0 sp=0xc000439f50 pc=0x45390b
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000439fe8 sp=0xc000439fe0 pc=0x48eca1

goroutine 2 gp=0xc000002e00 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc000072fa8 sp=0xc000072f88 pc=0x486e0e
runtime.goparkunlock(...)
        /usr/local/go/src/runtime/proc.go:441
runtime.forcegchelper()
        /usr/local/go/src/runtime/proc.go:348 +0xb3 fp=0xc000072fe0 sp=0xc000072fa8 pc=0x453c53
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000072fe8 sp=0xc000072fe0 pc=0x48eca1
created by runtime.init.7 in goroutine 1
        /usr/local/go/src/runtime/proc.go:336 +0x1a

goroutine 18 gp=0xc0000aa380 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc00006e780 sp=0xc00006e760 pc=0x486e0e
runtime.goparkunlock(...)
        /usr/local/go/src/runtime/proc.go:441
runtime.bgsweep(0xc0000b8000)
        /usr/local/go/src/runtime/mgcsweep.go:316 +0xdf fp=0xc00006e7c8 sp=0xc00006e780 pc=0x43e45f
runtime.gcenable.gowrap1()
        /usr/local/go/src/runtime/mgc.go:204 +0x25 fp=0xc00006e7e0 sp=0xc00006e7c8 pc=0x4328c5
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00006e7e8 sp=0xc00006e7e0 pc=0x48eca1
created by runtime.gcenable in goroutine 1
        /usr/local/go/src/runtime/mgc.go:204 +0x66

goroutine 19 gp=0xc0000aa540 m=nil [GC scavenge wait]:
runtime.gopark(0x10000?, 0x15d2cc8?, 0x0?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc00006ef78 sp=0xc00006ef58 pc=0x486e0e
runtime.goparkunlock(...)
        /usr/local/go/src/runtime/proc.go:441
runtime.(*scavengerState).park(0x1ecc9a0)
        /usr/local/go/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc00006efa8 sp=0xc00006ef78 pc=0x43bea9
runtime.bgscavenge(0xc0000b8000)
        /usr/local/go/src/runtime/mgcscavenge.go:658 +0x59 fp=0xc00006efc8 sp=0xc00006efa8 pc=0x43c439
runtime.gcenable.gowrap2()
        /usr/local/go/src/runtime/mgc.go:205 +0x25 fp=0xc00006efe0 sp=0xc00006efc8 pc=0x432865
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00006efe8 sp=0xc00006efe0 pc=0x48eca1
created by runtime.gcenable in goroutine 1
        /usr/local/go/src/runtime/mgc.go:205 +0xa5

goroutine 34 gp=0xc000184380 m=nil [finalizer wait]:
runtime.gopark(0x1b8?, 0xc000002380?, 0x1?, 0x23?, 0xc000072688?)
        /usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc000072630 sp=0xc000072610 pc=0x486e0e
runtime.runfinq()
        /usr/local/go/src/runtime/mfinal.go:196 +0x107 fp=0xc0000727e0 sp=0xc000072630 pc=0x431887
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0000727e8 sp=0xc0000727e0 pc=0x48eca1
created by runtime.createfing in goroutine 1
        /usr/local/go/src/runtime/mfinal.go:166 +0x3d

goroutine 35 gp=0xc000184e00 m=nil [chan receive]:
runtime.gopark(0xc0001f5a40?, 0xc000116018?, 0x60?, 0x47?, 0x56cac8?)
        /usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc000304718 sp=0xc0003046f8 pc=0x486e0e
runtime.chanrecv(0xc000180310, 0x0, 0x1)
        /usr/local/go/src/runtime/chan.go:664 +0x445 fp=0xc000304790 sp=0xc000304718 pc=0x4232a5
runtime.chanrecv1(0x0?, 0x0?)
        /usr/local/go/src/runtime/chan.go:506 +0x12 fp=0xc0003047b8 sp=0xc000304790 pc=0x422e32
runtime.unique_runtime_registerUniqueMapCleanup.func2(...)
        /usr/local/go/src/runtime/mgc.go:1796
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
        /usr/local/go/src/runtime/mgc.go:1799 +0x2f fp=0xc0003047e0 sp=0xc0003047b8 pc=0x435a0f
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0003047e8 sp=0xc0003047e0 pc=0x48eca1
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1794 +0x79

goroutine 36 gp=0xc000185180 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc000304f38 sp=0xc000304f18 pc=0x486e0e
runtime.gcBgMarkWorker(0xc000181730)
        /usr/local/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc000304fc8 sp=0xc000304f38 pc=0x434d29
runtime.gcBgMarkStartWorkers.gowrap1()
        /usr/local/go/src/runtime/mgc.go:1339 +0x25 fp=0xc000304fe0 sp=0xc000304fc8 pc=0x434c05
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000304fe8 sp=0xc000304fe0 pc=0x48eca1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1339 +0x105

goroutine 20 gp=0xc0000aa700 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc00006f738 sp=0xc00006f718 pc=0x486e0e
runtime.gcBgMarkWorker(0xc000181730)
        /usr/local/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc00006f7c8 sp=0xc00006f738 pc=0x434d29
runtime.gcBgMarkStartWorkers.gowrap1()
        /usr/local/go/src/runtime/mgc.go:1339 +0x25 fp=0xc00006f7e0 sp=0xc00006f7c8 pc=0x434c05
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc00006f7e8 sp=0xc00006f7e0 pc=0x48eca1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1339 +0x105

goroutine 37 gp=0xc000185340 m=nil [GC worker (idle)]:
runtime.gopark(0x17e6a908c21f?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc000305738 sp=0xc000305718 pc=0x486e0e
runtime.gcBgMarkWorker(0xc000181730)
        /usr/local/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc0003057c8 sp=0xc000305738 pc=0x434d29
runtime.gcBgMarkStartWorkers.gowrap1()
        /usr/local/go/src/runtime/mgc.go:1339 +0x25 fp=0xc0003057e0 sp=0xc0003057c8 pc=0x434c05
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0003057e8 sp=0xc0003057e0 pc=0x48eca1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1339 +0x105

goroutine 50 gp=0xc000102380 m=nil [GC worker (idle)]:
runtime.gopark(0x17e6a9090238?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc000300738 sp=0xc000300718 pc=0x486e0e
runtime.gcBgMarkWorker(0xc000181730)
        /usr/local/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc0003007c8 sp=0xc000300738 pc=0x434d29
runtime.gcBgMarkStartWorkers.gowrap1()
        /usr/local/go/src/runtime/mgc.go:1339 +0x25 fp=0xc0003007e0 sp=0xc0003007c8 pc=0x434c05
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0003007e8 sp=0xc0003007e0 pc=0x48eca1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1339 +0x105

goroutine 51 gp=0xc000102540 m=nil [GC worker (idle)]:
runtime.gopark(0x17e6a9066e73?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc000300f38 sp=0xc000300f18 pc=0x486e0e
runtime.gcBgMarkWorker(0xc000181730)
        /usr/local/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc000300fc8 sp=0xc000300f38 pc=0x434d29
runtime.gcBgMarkStartWorkers.gowrap1()
        /usr/local/go/src/runtime/mgc.go:1339 +0x25 fp=0xc000300fe0 sp=0xc000300fc8 pc=0x434c05
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000300fe8 sp=0xc000300fe0 pc=0x48eca1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1339 +0x105

goroutine 52 gp=0xc000102700 m=nil [GC worker (idle)]:
runtime.gopark(0x17e6a9088da0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc000301738 sp=0xc000301718 pc=0x486e0e
runtime.gcBgMarkWorker(0xc000181730)
        /usr/local/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc0003017c8 sp=0xc000301738 pc=0x434d29
runtime.gcBgMarkStartWorkers.gowrap1()
        /usr/local/go/src/runtime/mgc.go:1339 +0x25 fp=0xc0003017e0 sp=0xc0003017c8 pc=0x434c05
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc0003017e8 sp=0xc0003017e0 pc=0x48eca1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1339 +0x105

goroutine 53 gp=0xc0001028c0 m=nil [GC worker (idle)]:
runtime.gopark(0x17e6a9087c16?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc000301f38 sp=0xc000301f18 pc=0x486e0e
runtime.gcBgMarkWorker(0xc000181730)
        /usr/local/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc000301fc8 sp=0xc000301f38 pc=0x434d29
runtime.gcBgMarkStartWorkers.gowrap1()
        /usr/local/go/src/runtime/mgc.go:1339 +0x25 fp=0xc000301fe0 sp=0xc000301fc8 pc=0x434c05
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000301fe8 sp=0xc000301fe0 pc=0x48eca1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1339 +0x105

goroutine 38 gp=0xc000185500 m=nil [GC worker (idle)]:
runtime.gopark(0x1f7a980?, 0x1?, 0x8c?, 0xb6?, 0x0?)
        /usr/local/go/src/runtime/proc.go:435 +0xce fp=0xc000305f38 sp=0xc000305f18 pc=0x486e0e
runtime.gcBgMarkWorker(0xc000181730)
        /usr/local/go/src/runtime/mgc.go:1423 +0xe9 fp=0xc000305fc8 sp=0xc000305f38 pc=0x434d29
runtime.gcBgMarkStartWorkers.gowrap1()
        /usr/local/go/src/runtime/mgc.go:1339 +0x25 fp=0xc000305fe0 sp=0xc000305fc8 pc=0x434c05
runtime.goexit({})
        /usr/local/go/src/runtime/asm_amd64.s:1700 +0x1 fp=0xc000305fe8 sp=0xc000305fe0 pc=0x48eca1
created by runtime.gcBgMarkStartWorkers in goroutine 1
        /usr/local/go/src/runtime/mgc.go:1339 +0x105

rax    0x0
rbx    0xa4c6
rcx    0x7f96256a70fc
rdx    0x6
rdi    0xa4c6
rsi    0xa4c6
rbp    0x7f9626d33480
rsp    0x7ffd1e2d4bf0
r8     0x0
r9     0x7ffd1e2d4780
r10    0x8
r11    0x246
r12    0x26b735b0
r13    0x6
r14    0x0
r15    0x7f9612f38790
rip    0x7f96256a70fc
rflags 0x246
cs     0x33
fs     0x0
gs     0x0
time=2025-09-15T19:29:49.167+08:00 level=ERROR source=sched.go:489 msg="error loading llama server" error="llama runner process has terminated: exit status 2"
[GIN] 2025/09/15 - 19:29:49 | 500 |   432.59435ms |       127.0.0.1 | POST     "/api/generate"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions