Skip to content

Commit 3fcad1b

Browse files
Mohammed Noumaan Ahamedjklj077
andauthored
Update llama.cpp.rst (#739)
* Update llama.cpp.rst Llama.cpp just updated their program names, I've updated the article to use the new name. quantize -> llama-quantize main -> llama-cli simple -> llama-simple [Check out the PR](ggml-org/llama.cpp#7809) * Updated llama.cpp.rst ( removed -cml ) * Update llama.cpp.rst --------- Co-authored-by: Ren Xuancheng <[email protected]>
1 parent 0c28a94 commit 3fcad1b

File tree

1 file changed

+7
-3
lines changed

1 file changed

+7
-3
lines changed

docs/source/run_locally/llama.cpp.rst

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -55,14 +55,18 @@ Then you can run the model with the following command:
5555

5656
.. code:: bash
5757
58-
./main -m qwen2-7b-instruct-q5_k_m.gguf -n 512 --color -i -cml -f prompts/chat-with-qwen.txt
58+
./llama-cli -m qwen2-7b-instruct-q5_k_m.gguf \
59+
-n 512 -co -i -if -f prompts/chat-with-qwen.txt \
60+
--in-prefix "<|im_start|>user\n" \
61+
--in-suffix "<|im_end|>\n<|im_start|>assistant\n" \
62+
-ngl 80 -fa
5963
6064
where ``-n`` refers to the maximum number of tokens to generate. There
6165
are other hyperparameters for you to choose and you can run
6266

6367
.. code:: bash
6468
65-
./main -h
69+
./llama-cli -h
6670
6771
to figure them out.
6872

@@ -92,7 +96,7 @@ Then you can run the test with the following command:
9296

9397
.. code:: bash
9498
95-
./perplexity -m models/7B/ggml-model-q4_0.gguf -f wiki.test.raw
99+
./llama-perplexity -m <gguf_path> -f wiki.test.raw
96100
97101
where the output is like
98102

0 commit comments

Comments
 (0)