Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b5272
imatrix: fix oob writes if src1 is not contiguous (#13286)
b5271
clip : revert the change of BOI/EOI token for GLM-edge (⚠️ breaking c…
b5270
llama : Llama-3_1-Nemotron-Ultra-253B-v1 support (#12843)
b5269
llama : move end-user examples to tools directory (#13249) * llama : move end-user examples to tools directory --------- Co-authored-by: Xuan Son Nguyen <[email protected]>
b5267
context : fix reorder logic (#13267) ggml-ci
b5266
ggml : Enable MMA for BF16 in llamafile_sgemm (#13148) This patch upstreams llamafile's cpu matrix multiplication kernels for ppc64le using MMA builtins for BF16 data type. This change results in 9x - 40x gains in total speed S t/s (ie all tokens/total time), across various batch sizes tested using llama-batched-bench benchmark. The patch is tested with Meta-Lllama-3-8B, and Mistral-7B models (BF16 models generated by using llama-quantize from corresponding FP32 models) on an IBM POWER10 machine. Signed-off-by: Shalini Salomi Bodapati <[email protected]>
b5265
llama-model : support Qwen2 embedding models and pooling_mode_lasttok…
b5261
llama : orion rope type is neox (#13261)
b5260
llama : plamo rope type is neox (#13260)
b5259
llama-chat : reset glmedge chat template (#13253) * reset glmedge chat template * fix glmedge chat template