-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Description
Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.
After b2950 patch RPC functionality is broken. Offloading to 3 machines first server crashes with below message. Reverting back to b2949 fixes the problem.
ll_startrpc
create_backend: using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes
ggml_cuda_init: CUDA_USE_TENSOR_CORES: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce GTX 1070, compute capability 6.1, VMM: yes
Starting RPC server on 0.0.0.0:50052, backend memory: 8022 MB
Accepted client connection, free_mem=8412266496, total_mem=8500477952
GGML_ASSERT: /usr/local/src/ai/llamacpp/llama.cpp/ggml-backend.c:226: offset + size <= ggml_nbytes(tensor) && "tensor write out of bounds"
[New LWP 11678]
[New LWP 11684]
[New LWP 11685]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f66930dc3c7 in wait4 () from /lib64/libc.so.6
#0 0x00007f66930dc3c7 in wait4 () from /lib64/libc.so.6
#1 0x0000000000411f4b in ggml_print_backtrace ()
#2 0x000000000046639a in ggml_backend_tensor_set ()
#3 0x0000000000541d20 in start_rpc_server ()
#4 0x0000000000406ebc in main ()
[Inferior 1 (process 11677) detached]
/usr/local/bin/ll_startrpc: line 14: 11677 Aborted rpc-server -H 0.0.0.0 -p 50052