ggml : check cuda and metal argsort limits and add test #16323

CISC · 2025-09-28T23:43:41Z

CUDA and Metal support only a very limited column width for argsort, check so we can fall back to CPU.

Edit: Looks like Vulkan is limited too, but already checks:
https://github.com/ggml-org/llama.cpp/actions/runs/18081472255/job/51445345624?pr=16323#step:3:10561

OpenCL checks:

llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp

Line 2967 in 3ecb2f6

return cols <= max_workgroup_size && op->src[0]->type == GGML_TYPE_F32;

@qnixsynapse @NeoZhangJianyu SYCL is probably limited too, but does not check.

jeffbolznv · 2025-09-29T00:14:18Z

Is the model really using 16k for ne[0]? That's significantly bigger than what I've seen before and at that size we wouldn't be able to use shared memory on all hardware.

CISC · 2025-09-29T00:23:32Z

Is the model really using 16k for ne[0]? That's significantly bigger than what I've seen before and at that size we wouldn't be able to use shared memory on all hardware.

Depends on the input, it will be n_tokens * (n_expert / 8).

CISC · 2025-09-29T00:32:03Z

Is the model really using 16k for ne[0]? That's significantly bigger than what I've seen before and at that size we wouldn't be able to use shared memory on all hardware.

Depends on the input, it will be n_tokens * (n_expert / 8).

I'm thinking I should bypass the first top_k somehow (finding the top 2 tokens in each group) in the group selection as this could get quite slow...

jeffbolznv · 2025-09-29T00:40:34Z

If you only need the indices of the top 2, then yeah a full sort is overkill. Seems like you'd want a modified argmax...

CISC · 2025-09-29T00:46:53Z

If you only need the indices of the top 2, then yeah a full sort is overkill. Seems like you'd want a modified argmax...

Hmmm, an argmax2 or even argmaxn could be nice.

JohannesGaessler · 2025-09-29T12:13:09Z

ggml/src/ggml-cuda/ggml-cuda.cu

-        case GGML_OP_ARGSORT:
        case GGML_OP_ACC:
            return true;
+        case GGML_OP_ARGSORT:
+            // TODO: Support arbitrary column width
+            return op->src[0]->ne[0] <= 1024;


Preferably keep the order of ggml ops in switch statements consistent with the order in which they're being declared in ggml.h.

Didn't reflect much on the order, just didn't want to unnecessarily break up the fall-throughs, I'll keep it in mind for the future.

check cuda argsort limits and add test

c91285a

CISC requested a review from slaren as a code owner September 28, 2025 23:43

github-actions bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Sep 28, 2025

CISC requested a review from JohannesGaessler September 28, 2025 23:44

slaren approved these changes Sep 28, 2025

View reviewed changes

add metal check

0e5e019

CISC requested a review from ggerganov as a code owner September 29, 2025 00:10

github-actions bot added the Apple Metal https://en.wikipedia.org/wiki/Metal_(API) label Sep 29, 2025

CISC changed the title ~~cuda : check cuda argsort limits and add test~~ ggml : check cuda and metal argsort limits and add test Sep 29, 2025

ggerganov approved these changes Sep 29, 2025

View reviewed changes

CISC merged commit adc7634 into master Sep 29, 2025
64 of 67 checks passed

CISC deleted the cisc/check-cuda-argsort-limits branch September 29, 2025 09:09

JohannesGaessler reviewed Sep 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml : check cuda and metal argsort limits and add test #16323

ggml : check cuda and metal argsort limits and add test #16323

CISC commented Sep 28, 2025 •

edited

Loading

Uh oh!

jeffbolznv commented Sep 29, 2025

Uh oh!

CISC commented Sep 29, 2025

Uh oh!

CISC commented Sep 29, 2025

Uh oh!

jeffbolznv commented Sep 29, 2025

Uh oh!

CISC commented Sep 29, 2025

Uh oh!

Uh oh!

JohannesGaessler Sep 29, 2025

Uh oh!

CISC Sep 29, 2025

Uh oh!

Uh oh!

ggml : check cuda and metal argsort limits and add test #16323

ggml : check cuda and metal argsort limits and add test #16323

Conversation

CISC commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeffbolznv commented Sep 29, 2025

Uh oh!

CISC commented Sep 29, 2025

Uh oh!

CISC commented Sep 29, 2025

Uh oh!

jeffbolznv commented Sep 29, 2025

Uh oh!

CISC commented Sep 29, 2025

Uh oh!

Uh oh!

JohannesGaessler Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

CISC Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

CISC commented Sep 28, 2025 •

edited

Loading