Slight quantization improvement for Q4_K and Q5_K #5361

ikawrakow · 2024-02-06T12:00:03Z

Nothing earth-shattering, but I noticed that I have not brought over all changes I made to Q4_K and Q5_K quantization in my repo, so here it is. We get small PPL improvements. E.g., for Mistral-7B with context of 512 and imatrix from wiki.traiin.raw (QError is defined as PPL(Q)/PPL(fp16)-1, where PPL(fp16) = 5.6924 for Mistral-7B):

Quantization	PPL Master	PPL PR	QError(PR)/QError(Master)
Q4_K_S	5.7428	5.7375	0.895
Q5_K_S	5.7107	5.7052	0.699

BarfingLemurs · 2024-02-06T12:33:19Z

@ikawrakow could you add some PR links to recent quantization changes in the readme?

here are some added a good while back: https://github.com/ggerganov/llama.cpp#quantization

ikawrakow · 2024-02-06T15:27:58Z

@ikawrakow could you add some PR links to recent quantization changes in the readme?

here are some added a good while back: https://github.com/ggerganov/llama.cpp#quantization

See #5366

* Q4_K: slightly better quantization * Q5_K: slightly better quantization --------- Co-authored-by: Iwan Kawrakow <[email protected]>

Iwan Kawrakow added 2 commits February 6, 2024 07:53

Q4_K: slightly better quantization

f58d49e

Q5_K: slightly better quantization

d3cc153

ggerganov approved these changes Feb 6, 2024

View reviewed changes

ikawrakow merged commit f57fadc into master Feb 6, 2024

ikawrakow deleted the ik/q4k_tuning branch February 6, 2024 15:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Slight quantization improvement for Q4_K and Q5_K #5361

Slight quantization improvement for Q4_K and Q5_K #5361

Uh oh!

ikawrakow commented Feb 6, 2024

Uh oh!

BarfingLemurs commented Feb 6, 2024

Uh oh!

ikawrakow commented Feb 6, 2024

Uh oh!

Uh oh!

Slight quantization improvement for Q4_K and Q5_K #5361

Slight quantization improvement for Q4_K and Q5_K #5361

Uh oh!

Conversation

ikawrakow commented Feb 6, 2024

Uh oh!

BarfingLemurs commented Feb 6, 2024

Uh oh!

ikawrakow commented Feb 6, 2024

Uh oh!

Uh oh!