Releases · ggml-org/llama.cpp

03 May 23:40

3e959f0

b5272

imatrix: fix oob writes if src1 is not contiguous (#13286)

Assets 26

03 May 18:54

github-actions

b5271

36667c8

b5271

clip : revert the change of BOI/EOI token for GLM-edge (⚠️ breaking c…

Assets 26

03 May 16:31

github-actions

b5270

3bf785f

b5270

llama : Llama-3_1-Nemotron-Ultra-253B-v1 support (#12843)

Assets 26

02 May 19:23

github-actions

b5269

1d36b36

b5269

llama : move end-user examples to tools directory (#13249)

* llama : move end-user examples to tools directory

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>

Assets 26

02 May 18:58

github-actions

b5267

a75cb30

b5267

context : fix reorder logic (#13267)

ggml-ci

Assets 26

02 May 18:36

github-actions

b5266

3f3769b

b5266

ggml : Enable MMA for BF16 in llamafile_sgemm (#13148)

This patch upstreams llamafile's cpu matrix multiplication kernels for ppc64le using MMA builtins for BF16 data type.

This change results in 9x - 40x gains
in total speed S t/s (ie all tokens/total time), across various batch sizes tested using llama-batched-bench benchmark.

The patch is tested with Meta-Lllama-3-8B,
and Mistral-7B models (BF16 models generated by using llama-quantize from corresponding FP32 models) on an IBM POWER10 machine.

Signed-off-by: Shalini Salomi Bodapati <[email protected]>

Assets 26

02 May 17:50

github-actions

b5265

2f56761

b5265

llama-model : support Qwen2 embedding models and pooling_mode_lasttok…

Assets 26

02 May 13:17

github-actions

b5261

cb06a3c

b5261

llama : orion rope type is neox (#13261)

Assets 26

02 May 12:37

github-actions

b5260

626083f

b5260

llama : plamo rope type is neox (#13260)

Assets 26

02 May 09:51

github-actions

b5259

2af6880

b5259

llama-chat : reset glmedge chat template (#13253)

* reset glmedge chat template

* fix glmedge chat template

Assets 26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: ggml-org/llama.cpp

b5272

Uh oh!

b5271

Uh oh!

b5270

Uh oh!

b5269

Uh oh!

b5267

Uh oh!

b5266

Uh oh!

b5265

Uh oh!

b5261

Uh oh!

b5260

Uh oh!

b5259

Uh oh!