-
Notifications
You must be signed in to change notification settings - Fork 13.2k
ggml : check cuda and metal argsort limits and add test #16323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Is the model really using 16k for ne[0]? That's significantly bigger than what I've seen before and at that size we wouldn't be able to use shared memory on all hardware. |
Depends on the input, it will be |
I'm thinking I should bypass the first |
If you only need the indices of the top 2, then yeah a full sort is overkill. Seems like you'd want a modified argmax... |
Hmmm, an |
case GGML_OP_ARGSORT: | ||
case GGML_OP_ACC: | ||
return true; | ||
case GGML_OP_ARGSORT: | ||
// TODO: Support arbitrary column width | ||
return op->src[0]->ne[0] <= 1024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preferably keep the order of ggml ops in switch statements consistent with the order in which they're being declared in ggml.h
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't reflect much on the order, just didn't want to unnecessarily break up the fall-throughs, I'll keep it in mind for the future.
CUDA and Metal support only a very limited column width for
argsort
, check so we can fall back to CPU.Edit: Looks like Vulkan is limited too, but already checks:
https://github.com/ggml-org/llama.cpp/actions/runs/18081472255/job/51445345624?pr=16323#step:3:10561
OpenCL checks:
llama.cpp/ggml/src/ggml-opencl/ggml-opencl.cpp
Line 2967 in 3ecb2f6
@qnixsynapse @NeoZhangJianyu SYCL is probably limited too, but does not check.