Fix parameter loading in PLaMo2 #16075

mitmul · 2025-09-18T11:06:57Z

I found some wrong parameter usages to convert and load the original parameters into the llama.cpp model of PLaMo2. I fixed them in this PR.

CISC · 2025-09-18T11:23:06Z

This looks like it will break num_attention_heads on old GGUFs?

mitmul · 2025-09-18T15:21:07Z

This looks like it will break num_attention_heads on old GGUFs?

You're right. Thanks. I updated this PR as in 10af12c to support existing GGUFs converted previously. I confirmed that the latest commit successfully runs a GGUF, mmnga/plamo-2-translate-gguf without any problems.

CISC

AFAICT there are no functional changes here besides making n_embd_head_k/v bound to hidden_size_per_head instead of calculated from hidden_size/num_attention_heads, n_head is not used in mamba layers anyway, what is the motivation for the PR?

CISC · 2025-09-21T11:35:08Z

src/llama-model.cpp

+
+               // Load attention parameters
+               ml.get_key(LLM_KV_ATTENTION_KEY_LENGTH,   hparams.n_embd_head_k, false);
+               ml.get_key(LLM_KV_ATTENTION_VALUE_LENGTH, hparams.n_embd_head_v, false);


Suggested change

// Load attention parameters

ml.get_key(LLM_KV_ATTENTION_KEY_LENGTH, hparams.n_embd_head_k, false);

ml.get_key(LLM_KV_ATTENTION_VALUE_LENGTH, hparams.n_embd_head_v, false);

Already done here:

llama.cpp/src/llama-model.cpp

Lines 555 to 580 in 10af12c

// non-transformer models do not have attention heads

if (hparams.n_head() > 0) {

// gpt-neox n_rot = rotary_pct * (n_embd / n_head)

// gpt-j n_rot = rotary_dim

hparams.n_embd_head_k = hparams.n_embd / hparams.n_head();

ml.get_key(LLM_KV_ATTENTION_KEY_LENGTH, hparams.n_embd_head_k, false);

hparams.n_embd_head_v = hparams.n_embd / hparams.n_head();

ml.get_key(LLM_KV_ATTENTION_VALUE_LENGTH, hparams.n_embd_head_v, false);

// sanity check for n_rot (optional)

hparams.n_rot = hparams.n_embd_head_k;

ml.get_key(LLM_KV_ROPE_DIMENSION_COUNT, hparams.n_rot, false);

if (arch == LLM_ARCH_LLAMA || arch == LLM_ARCH_DECI || arch == LLM_ARCH_FALCON) {

if (hparams.n_rot != hparams.n_embd_head_k) {

throw std::runtime_error(format("invalid n_rot: %u, expected %u", hparams.n_rot, hparams.n_embd_head_k));

}

}

} else {

hparams.n_rot = 0;

hparams.n_embd_head_k = 0;

hparams.n_embd_head_v = 0;

}

@CISC Ah, I applied your suggestion but noticed that the problem comes from the following two lines:

llama.cpp/src/llama-model.cpp

Line 560 in 10af12c

hparams.n_embd_head_k = hparams.n_embd / hparams.n_head();

llama.cpp/src/llama-model.cpp

Line 563 in 10af12c

hparams.n_embd_head_v = hparams.n_embd / hparams.n_head();

The main purpose of this PR is to support some variations of PLaMo2 model created by pruning larger models, that has its n_embed_head_k larger than n_embd / n_head.

So let me roll back this changes to support such cases in a variant of PLaMo2 models.

The main purpose of this PR is to support some variations of PLaMo2 model created by pruning larger models, that has its n_embed_head_k larger than n_embd / n_head.

So let me roll back this changes to support such cases in a variant of PLaMo2 models.

But they would have to have the metadata present to work then, so no issue.

@mitmul gentle ping

@CISC Sorry for my late reaction. Um, could you explain what you meant with this?

But they would have to have the metadata present to work then, so no issue.

For example, pfnet/plamo-2.1-2b-cpt has

hidden_size = 2048 (for n_embd)

hidden_size_per_head = 128 (for n_embd_head_k and n_embd_head_v)

num_key_value_heads = 32 (for n_head_arr in il % 2 = 0 layers (attention layers, not mamba layers)),

so that n_embd_head_k/v != n_embd / n_head.
Then, with the current load_hparams() it goes through the else block here:

llama.cpp/src/llama-model.cpp

Lines 577 to 581 in 132d673

} else {

hparams.n_rot = 0;

hparams.n_embd_head_k = 0;

hparams.n_embd_head_v = 0;

}

because hparams.n_head() will return 0 that comes from the first element of n_head_arr indicating mamba layers.

So, I thought we need to call the following functions to set hparams.n_embd_head_k/v with the right values comes from hidden_size_per_head in config.json:

llama.cpp/src/llama-model.cpp

Lines 1082 to 1083 in 1be2787

ml.get_key(LLM_KV_ATTENTION_KEY_LENGTH, hparams.n_embd_head_k, false);

ml.get_key(LLM_KV_ATTENTION_VALUE_LENGTH, hparams.n_embd_head_v, false);

That's why I sent this PR, but if I wrongly understand something around model loading, please let me know it and I'd appreciate it if you would give me some advices about how we can load such variants like pfnet/plamo-2.1-2b-cpt.

Aha, that makes sense then, thank you for the explanation. :)

Co-authored-by: Sigbjørn Skjæret <[email protected]>

mitmul · 2025-09-22T08:49:27Z

The core intention of this PR is to use n_embd_head_k/v which is set by hidden_size_per_head in config.json instead of calculating hidden_size / num_attention_heads for it. Because in some variants of PLaMo2 models, we use larger n_embd_head_k/v than hidden_size / num_attention_heads.

mitmul added 4 commits September 18, 2025 13:28

Fix to use hidden_size_per_head

433782b

Fix num heads

4fe5c96

Fix array

b164aa1

Fix loading weights

7f5d8a9

github-actions bot added the python python script changes label Sep 18, 2025

Support old GGUF converted by the previous version of llama.cpp

10af12c

CISC approved these changes Sep 21, 2025

View reviewed changes

mitmul and others added 3 commits September 22, 2025 17:07

Update src/llama-model.cpp

0e8aff1

Co-authored-by: Sigbjørn Skjæret <[email protected]>

Merge remote-tracking branch 'upstream/master' into mitmul/fix-plamo2

90fbf6a

Move shared parameter definitions to the outside of loop

07b55f4

mitmul force-pushed the mitmul/fix-plamo2 branch from 17e6bb5 to 07b55f4 Compare September 22, 2025 08:19

Not calculating n_embd_head_k,v by n_embd / n_head

1be2787

CISC merged commit ded67b9 into ggml-org:master Oct 1, 2025
62 of 72 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix parameter loading in PLaMo2 #16075

Fix parameter loading in PLaMo2 #16075

Uh oh!

mitmul commented Sep 18, 2025

Uh oh!

CISC commented Sep 18, 2025

Uh oh!

mitmul commented Sep 18, 2025

Uh oh!

CISC left a comment

Uh oh!

CISC Sep 21, 2025

Uh oh!

mitmul Sep 22, 2025

Uh oh!

CISC Sep 22, 2025

Uh oh!

CISC Sep 24, 2025

Uh oh!

mitmul Oct 1, 2025

Uh oh!

CISC Oct 1, 2025

Uh oh!

mitmul commented Sep 22, 2025

Uh oh!

Uh oh!

Uh oh!

	// non-transformer models do not have attention heads
	if (hparams.n_head() > 0) {
	// gpt-neox n_rot = rotary_pct * (n_embd / n_head)
	// gpt-j n_rot = rotary_dim

	hparams.n_embd_head_k = hparams.n_embd / hparams.n_head();
	ml.get_key(LLM_KV_ATTENTION_KEY_LENGTH, hparams.n_embd_head_k, false);

	hparams.n_embd_head_v = hparams.n_embd / hparams.n_head();
	ml.get_key(LLM_KV_ATTENTION_VALUE_LENGTH, hparams.n_embd_head_v, false);

	// sanity check for n_rot (optional)
	hparams.n_rot = hparams.n_embd_head_k;

	ml.get_key(LLM_KV_ROPE_DIMENSION_COUNT, hparams.n_rot, false);

	if (arch == LLM_ARCH_LLAMA \|\| arch == LLM_ARCH_DECI \|\| arch == LLM_ARCH_FALCON) {
	if (hparams.n_rot != hparams.n_embd_head_k) {
	throw std::runtime_error(format("invalid n_rot: %u, expected %u", hparams.n_rot, hparams.n_embd_head_k));
	}
	}
	} else {
	hparams.n_rot = 0;
	hparams.n_embd_head_k = 0;
	hparams.n_embd_head_v = 0;
	}

Fix parameter loading in PLaMo2 #16075

Fix parameter loading in PLaMo2 #16075

Uh oh!

Conversation

mitmul commented Sep 18, 2025

Uh oh!

CISC commented Sep 18, 2025

Uh oh!

mitmul commented Sep 18, 2025

Uh oh!

CISC left a comment

Choose a reason for hiding this comment

Uh oh!

CISC Sep 21, 2025

Choose a reason for hiding this comment

Uh oh!

mitmul Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

CISC Sep 22, 2025

Choose a reason for hiding this comment

Uh oh!

CISC Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

mitmul Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

CISC Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

mitmul commented Sep 22, 2025

Uh oh!

Uh oh!

Uh oh!