Releases · ggml-org/llama.cpp

30 Sep 00:43

a74a0d6

b6638 Latest

Latest

tests: override test_set_rows::max_nmse_err to allow for occasional r…

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-09-30T00:43:53Z
llama-b6638-bin-macos-arm64.zip

sha256:9ea880a82491585da8172b5f1c3daa98696a406f53e0f377e77324aff56ffcdf

10.3 MB 2025-09-30T00:44:03Z
llama-b6638-bin-macos-x64.zip

sha256:6d57e023d403b2eb6797a1d5879a26cc57f5e6bfcf4bc769cc0cd4442c44cf84

27.7 MB 2025-09-30T00:44:05Z
llama-b6638-bin-ubuntu-vulkan-x64.zip

sha256:f0f84414a2a3be6d1582de7bac5f9ce4c5afe6a4278508125c473ef6886adb46

25.6 MB 2025-09-30T00:44:06Z
llama-b6638-bin-ubuntu-x64.zip

sha256:48ac3db8563a7fe6a217c293e4cb2964301f319d05ee3f10b46bc17002062965

12.3 MB 2025-09-30T00:44:07Z
llama-b6638-bin-win-cpu-arm64.zip

sha256:5a28edd26511301f81ad7316daeaa49755fd5fe966bb72b3e7a0c44f6274cb67

10.4 MB 2025-09-30T00:44:08Z
llama-b6638-bin-win-cpu-x64.zip

sha256:7d6d0190fd424f2844f382214d5d0866995a36203c3228cfaefb2bda56d18fda

13.5 MB 2025-09-30T00:44:09Z
llama-b6638-bin-win-cuda-12.4-x64.zip

sha256:bdcc9f52895f169429cf15177c574d7696d922bb1fc0d684155bf3f74a936e58

149 MB 2025-09-30T00:44:10Z
llama-b6638-bin-win-hip-radeon-x64.zip

sha256:cd74565409f1d1792629eb04927f361fd47bcd0e5625c114f840e4689ab2c4d6

313 MB 2025-09-30T00:44:15Z
llama-b6638-bin-win-opencl-adreno-arm64.zip

sha256:5ec477ca42fa126fc0502e0b0951dfe74a511219092dcd0d06efb6f509298c30

10.9 MB 2025-09-30T00:44:24Z
Source code (zip)

2025-09-30T00:26:34Z
Source code (tar.gz)

2025-09-30T00:26:34Z

29 Sep 20:25

github-actions

b6635

b77e6c1

b6635

ggml: riscv: add riscv spacemit backend (#15288)

* ggml: add spacemit backend

Change-Id: I249bdc043485d815a9c351867137bc1e27cc2e23

* add new line at end of file

Change-Id: I889ed1c85fb45e62350ecde0c06f70450cadfbe2

* add riscv zba extension limit

Change-Id: I321eb200f859751727afe5cae13074dfce2bb0ce

* fixed for review comments, file renamed and format

Change-Id: Ia20b6ec24a36638e62e0fe07cf100916a7cce3ce

* fixed for code format, after clang-format

Change-Id: I5dc33a0412da3d3f2d77075d8939185d3009eca2

* use _Float16 instead of __fp16

Change-Id: I039fb02bb95270e641bc4442204e658735859d43

* add ci for riscv64-spacemit-ime-native

Change-Id: I711c1033061df1a289ea77891b2997599dfe8279

* update debian-13-riscv64-spacemit-ime-native ci label

Change-Id: Ifb2b891e2fca57b5da604fce2ac255f27731179a

* remove license comment for spacemit ime

Change-Id: If0dc3ca30a958631ccca0a28b62e0b825f9fb0c3

* upgrade binutils for gcc ime

Change-Id: Ibf2fa74c1064408974cb5b45f044d40987e5fb45

* add spacemit ime cross jobs

Change-Id: I80d74909941d41cb9cd09e51d8baf01c985cbfc6

* remove native compile for riscv64-spacemit-ime

Change-Id: I01920afafdc73fa7424014fd648d243f8ec9e25e

* ci : add caching for spacemit ime cross toolchain

Change-Id: Ic54a192019a2fd982bbd58225ce3bbc38f4053de

* ci: bug fixed for cache path and env

Change-Id: I28c42e10b6fff053bb6580926ca2353448cb042a

* Update .github/workflows/build-linux-cross.yml for cache path

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* bugfixed for  build-linux-cross.yml,  syntax error

Co-authored-by: Sigbjørn Skjæret <[email protected]>

---------

Co-authored-by: cailinxi <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>

Assets 15

29 Sep 20:20

github-actions

b6634

2ddd3f2

b6634

sync : ggml

Assets 15

29 Sep 11:38

github-actions

b6628

02463ab

b6628

ggml-backend : add root cause in error message if loading backend lib…

Assets 15

29 Sep 09:33

github-actions

b6627

adc7634

b6627

ggml : check cuda and metal argsort limits and add test (#16323)

* check cuda argsort limits and add test

* add metal check

Assets 15

29 Sep 07:53

github-actions

b6624

2f61c0f

b6624

llama-cli: prevent spurious assistant token (#16202)

* tools/main: llama-cli: prevent spurious assistant token (#13402)

During prompt ingestion, prompt tokens are accepted into the sampler history (for repetition penalties). The conversation-mode path then appended `common_sampler_last(smpl)` to `assistant_ss` before any new token was sampled. At that point, "last" was a prompt-side token (e.g., an input prefix), so the assistant chat message began with an extra piece.

Fix: append to `assistant_ss` only for a newly sampled (non-EOG) token. This affects only chat message assembly (`assistant_ss` / `chat_msgs` / `common_chat_format_single`); terminal stdout is unchanged. Sampling order/logits are unchanged.

Fixes #13402.

Signed-off-by: Vinkal Chudgar <[email protected]>

* Update tools/main/main.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* tools/main: remove outdated comment

Signed-off-by: Vinkal Chudgar <[email protected]>

---------

Signed-off-by: Vinkal Chudgar <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>

Assets 15

29 Sep 06:55

github-actions

b6623

3ffd0fa

b6623

perplexity : show more kl-divergence data (#16321)

Adds additional percentile data for displayed in the output of `llama-perplexity --kl-divergence`:
- Added 95 percentile (mirroring existing 5 percentile)
- Added 0.1 percentile (mirroring existing 99.9 percentile)

Assets 15

29 Sep 06:03

github-actions

b6622

a4a0aa5

b6622

ggml : fix dependencies for ggml_set_rows (#16318)

Assets 15

29 Sep 05:15

github-actions

b6621

92cd103

b6621

vulkan: Fix validation failure in quantized flash attention (#16292)

Assets 15

28 Sep 19:44

github-actions

b6619

bd0af02

b6619

common : fix reasoning before forced tool call via tool_choice = requ…

Assets 15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b6638

Uh oh!

b6635

Uh oh!

b6634

Uh oh!

b6628

Uh oh!

b6627

Uh oh!

b6624

Uh oh!

b6623

Uh oh!

b6622

Uh oh!

b6621

Uh oh!

b6619

Uh oh!