ggml : fix GGML_F32_VEC_FMA argument order in ggml_vec_mad1_f32 #16307

CISC · 2025-09-28T14:35:01Z

ggerganov · 2025-09-28T15:07:07Z

Don't think this is correct. The order for the GGML_F32_VEC_FMA macro is always the same, but it can differ on some instructions sets. Which one are you using?

I think this one is wrong:

llama.cpp/ggml/src/ggml-cpu/simd-mappings.h

Lines 821 to 824 in 6a2c614

    
           #if defined(__FMA__) 
        
               // TODO: Does this work? 
        
               #define GGML_F32x4_FMA(a, b, c) _mm_fmadd_ps(b, c, a) 
        
           #else

https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_fmadd_ps&ig_expand=3103

CISC · 2025-09-28T15:20:20Z

Don't think this is correct. The order for the GGML_F32_VEC_FMA macro is always the same, but it can differ on some instructions sets. Which one are you using?

This one, but if I change that everything else breaks

llama.cpp/ggml/src/ggml-cpu/simd-mappings.h

Lines 533 to 537 in 6a2c614

    
           #if defined(__FMA__) 
        
               #define GGML_F32x8_FMA(a, b, c) _mm256_fmadd_ps(b, c, a) 
        
           #else 
        
               #define GGML_F32x8_FMA(a, b, c) _mm256_add_ps(_mm256_mul_ps(b, c), a) 
        
           #endif

CISC · 2025-09-28T15:26:46Z

This is confusing, some places it's used like this:

llama.cpp/ggml/src/ggml-cpu/vec.h

Line 300 in 05c0380

ay2 = GGML_F32_VEC_FMA(ay2, ax2, vx);

and other places like this:

llama.cpp/ggml/src/ggml-cpu/vec.h

Line 248 in 05c0380

sum[k][j] = GGML_F16_VEC_FMA(sum[k][j], ax[j], ay[j]);

CISC · 2025-09-28T15:43:02Z

According to Intel it should be a * b + c, so something is screwy:
https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_fmadd_ps&ig_expand=3107

ggerganov · 2025-09-28T15:48:32Z

It appears that the meaning of GGML_F32_VEC_FMA(a, b, c) and GGML_F16_VEC_FMA(a, b, c) is intended to be:

b*c + a

So the change that you propose makes sense now and this is inline with the Intel docs.

CISC · 2025-09-28T16:01:38Z

So the change that you propose makes sense now and this is inline with the Intel docs.

Yes, I think so too, it's just somewhat confusing with the order and naming between vec_mad1 and vec_mad functions...

CISC · 2025-09-28T16:13:14Z

It seems this codepath is not taken when built for CUDA as the backend-ops test succeeds...

Edit: I guess this means we never really test hardware optimized CPU backend?

ggerganov · 2025-09-28T18:50:45Z

It seems this codepath is not taken when built for CUDA as the backend-ops test succeeds...

Edit: I guess this means we never really test hardware optimized CPU backend?

I think the reason we don't see it is because the tests currently have ne0 == 10 which is too small to enter the SIMD path. If you add a larger tests (for example, ne0 == 100) it should fail.

llama.cpp/tests/test-backend-ops.cpp

Lines 6156 to 6157 in d9e0e7c

    
           test_cases.emplace_back(new test_scale()); 
        
           test_cases.emplace_back(new test_scale(GGML_TYPE_F32, {10, 10, 10, 10}, 2.0f, 1.0f));

CISC · 2025-09-28T19:01:51Z

It seems this codepath is not taken when built for CUDA as the backend-ops test succeeds...
Edit: I guess this means we never really test hardware optimized CPU backend?

I think the reason we don't see it is because the tests currently have ne0 == 10 which is too small to enter the SIMD path. If you add a larger tests (for example, ne0 == 100) it should fail.

Yep, it does, I'll add the test.

fix GGML_F32_VEC_FMA argument order in ggml_vec_mad1_f32

7810a02

CISC requested review from ggerganov and slaren as code owners September 28, 2025 14:35

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Sep 28, 2025

add test that fails on simd

1e695e8

ggerganov approved these changes Sep 28, 2025

View reviewed changes

github-actions bot added the testing Everything test related label Sep 28, 2025

CISC merged commit b887d2f into master Sep 28, 2025
63 of 67 checks passed

CISC deleted the cisc/fix-ggml-vec-mad1-f32-fma branch September 28, 2025 21:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml : fix GGML_F32_VEC_FMA argument order in ggml_vec_mad1_f32 #16307

ggml : fix GGML_F32_VEC_FMA argument order in ggml_vec_mad1_f32 #16307

Uh oh!

CISC commented Sep 28, 2025

Uh oh!

ggerganov commented Sep 28, 2025

Uh oh!

CISC commented Sep 28, 2025

Uh oh!

CISC commented Sep 28, 2025 •

edited

Loading

Uh oh!

CISC commented Sep 28, 2025

Uh oh!

ggerganov commented Sep 28, 2025

Uh oh!

CISC commented Sep 28, 2025 •

edited

Loading

Uh oh!

CISC commented Sep 28, 2025 •

edited

Loading

Uh oh!

ggerganov commented Sep 28, 2025 •

edited

Loading

Uh oh!

CISC commented Sep 28, 2025

Uh oh!

Uh oh!

Uh oh!

ggml : fix GGML_F32_VEC_FMA argument order in ggml_vec_mad1_f32 #16307

ggml : fix GGML_F32_VEC_FMA argument order in ggml_vec_mad1_f32 #16307

Uh oh!

Conversation

CISC commented Sep 28, 2025

Uh oh!

ggerganov commented Sep 28, 2025

Uh oh!

CISC commented Sep 28, 2025

Uh oh!

CISC commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Sep 28, 2025

Uh oh!

ggerganov commented Sep 28, 2025

Uh oh!

CISC commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Sep 28, 2025

Uh oh!

Uh oh!

Uh oh!

CISC commented Sep 28, 2025 •

edited

Loading

CISC commented Sep 28, 2025 •

edited

Loading

CISC commented Sep 28, 2025 •

edited

Loading

ggerganov commented Sep 28, 2025 •

edited

Loading