Skip to content

Commit be79d9f

Browse files
authored
llama-bench: add --devices and --list-devices support (#16039)
* * llama-bench: add --devices support - Support --devices same as llama-server - Provide for benchmarking different device combinations - Include --list-devices like llama-server for convenience * fix: field display ordering restored * fix: integrated the rpc devices - aimed to mimic the server as much as possible * cleanup: defaults for list-devices - handle dup device listing with RPC * cleanup: remove dup device load calls * docs: update llama-bench - added the recently added n-cpu-moe option to the docs while in there * llama-bench: rpc device simplification * rpc servers unify with other devices earlier, simplifying code * --list-devices made stateless and simpler * various cleanup
1 parent f432d8d commit be79d9f

File tree

2 files changed

+182
-76
lines changed

2 files changed

+182
-76
lines changed

tools/llama-bench/README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,10 @@ options:
3030
--delay <0...N> (seconds) delay between each test (default: 0)
3131
-o, --output <csv|json|jsonl|md|sql> output format printed to stdout (default: md)
3232
-oe, --output-err <csv|json|jsonl|md|sql> output format printed to stderr (default: none)
33+
--list-devices list available devices and exit
3334
-v, --verbose verbose output
3435
--progress print test progress indicators
36+
-rpc, --rpc <rpc_servers> register RPC devices (comma separated)
3537
3638
test parameters:
3739
-m, --model <filename> (default: models/7B/ggml-model-q4_0.gguf)
@@ -48,11 +50,12 @@ test parameters:
4850
--cpu-strict <0|1> (default: 0)
4951
--poll <0...100> (default: 50)
5052
-ngl, --n-gpu-layers <n> (default: 99)
51-
-rpc, --rpc <rpc_servers> (default: none)
53+
-ncmoe, --n-cpu-moe <n> (default: 0)
5254
-sm, --split-mode <none|layer|row> (default: layer)
5355
-mg, --main-gpu <i> (default: 0)
5456
-nkvo, --no-kv-offload <0|1> (default: 0)
5557
-fa, --flash-attn <0|1> (default: 0)
58+
-dev, --device <dev0/dev1/...> (default: auto)
5659
-mmp, --mmap <0|1> (default: 1)
5760
-embd, --embeddings <0|1> (default: 0)
5861
-ts, --tensor-split <ts0/ts1/..> (default: 0)

0 commit comments

Comments
 (0)