Skip to content

Releases: ggml-org/llama.cpp

b6638

30 Sep 00:43
a74a0d6
Compare
Choose a tag to compare
tests: override test_set_rows::max_nmse_err to allow for occasional r…

b6635

29 Sep 20:25
b77e6c1
Compare
Choose a tag to compare
ggml: riscv: add riscv spacemit backend (#15288)

* ggml: add spacemit backend

Change-Id: I249bdc043485d815a9c351867137bc1e27cc2e23

* add new line at end of file

Change-Id: I889ed1c85fb45e62350ecde0c06f70450cadfbe2

* add riscv zba extension limit

Change-Id: I321eb200f859751727afe5cae13074dfce2bb0ce

* fixed for review comments, file renamed and format

Change-Id: Ia20b6ec24a36638e62e0fe07cf100916a7cce3ce

* fixed for code format, after clang-format

Change-Id: I5dc33a0412da3d3f2d77075d8939185d3009eca2

* use _Float16 instead of __fp16

Change-Id: I039fb02bb95270e641bc4442204e658735859d43

* add ci for riscv64-spacemit-ime-native

Change-Id: I711c1033061df1a289ea77891b2997599dfe8279

* update debian-13-riscv64-spacemit-ime-native ci label

Change-Id: Ifb2b891e2fca57b5da604fce2ac255f27731179a

* remove license comment for spacemit ime

Change-Id: If0dc3ca30a958631ccca0a28b62e0b825f9fb0c3

* upgrade binutils for gcc ime

Change-Id: Ibf2fa74c1064408974cb5b45f044d40987e5fb45

* add spacemit ime cross jobs

Change-Id: I80d74909941d41cb9cd09e51d8baf01c985cbfc6

* remove native compile for riscv64-spacemit-ime

Change-Id: I01920afafdc73fa7424014fd648d243f8ec9e25e

* ci : add caching for spacemit ime cross toolchain

Change-Id: Ic54a192019a2fd982bbd58225ce3bbc38f4053de

* ci: bug fixed for cache path and env

Change-Id: I28c42e10b6fff053bb6580926ca2353448cb042a

* Update .github/workflows/build-linux-cross.yml for cache path

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* bugfixed for  build-linux-cross.yml,  syntax error

Co-authored-by: Sigbjørn Skjæret <[email protected]>

---------

Co-authored-by: cailinxi <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>

b6634

29 Sep 20:20
Compare
Choose a tag to compare
sync : ggml

b6628

29 Sep 11:38
02463ab
Compare
Choose a tag to compare
ggml-backend : add root cause in error message if loading backend lib…

b6627

29 Sep 09:33
adc7634
Compare
Choose a tag to compare
ggml : check cuda and metal argsort limits and add test (#16323)

* check cuda argsort limits and add test

* add metal check

b6624

29 Sep 07:53
2f61c0f
Compare
Choose a tag to compare
llama-cli: prevent spurious assistant token (#16202)

* tools/main: llama-cli: prevent spurious assistant token (#13402)

During prompt ingestion, prompt tokens are accepted into the sampler history (for repetition penalties). The conversation-mode path then appended `common_sampler_last(smpl)` to `assistant_ss` before any new token was sampled. At that point, "last" was a prompt-side token (e.g., an input prefix), so the assistant chat message began with an extra piece.

Fix: append to `assistant_ss` only for a newly sampled (non-EOG) token. This affects only chat message assembly (`assistant_ss` / `chat_msgs` / `common_chat_format_single`); terminal stdout is unchanged. Sampling order/logits are unchanged.

Fixes #13402.

Signed-off-by: Vinkal Chudgar <[email protected]>

* Update tools/main/main.cpp

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* tools/main: remove outdated comment

Signed-off-by: Vinkal Chudgar <[email protected]>

---------

Signed-off-by: Vinkal Chudgar <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>

b6623

29 Sep 06:55
3ffd0fa
Compare
Choose a tag to compare
perplexity : show more kl-divergence data (#16321)

Adds additional percentile data for displayed in the output of `llama-perplexity --kl-divergence`:
- Added 95 percentile (mirroring existing 5 percentile)
- Added 0.1 percentile (mirroring existing 99.9 percentile)

b6622

29 Sep 06:03
a4a0aa5
Compare
Choose a tag to compare
ggml : fix dependencies for ggml_set_rows (#16318)

b6621

29 Sep 05:15
92cd103
Compare
Choose a tag to compare
vulkan: Fix validation failure in quantized flash attention (#16292)

b6619

28 Sep 19:44
bd0af02
Compare
Choose a tag to compare
common : fix reasoning before forced tool call via tool_choice = requ…