Skip to content

Conversation

netrunnereve
Copy link
Collaborator

This adds a test-backend-ops perf test for mul_mat_id using the matrix sizes from qwen3-30b-a3b and gpt-oss-20b. As this is really only meant for testing optimizations I'm only including a limited number of quants and batch sizes for now.

@github-actions github-actions bot added the testing Everything test related label Aug 24, 2025
@netrunnereve netrunnereve changed the title Add test-backend-ops performance test for mul mat id tests: add test-backend-ops performance test for mul mat id Aug 24, 2025
@netrunnereve netrunnereve merged commit 44b1efa into ggml-org:master Aug 26, 2025
48 checks passed
@netrunnereve netrunnereve deleted the mat_mul_id branch August 26, 2025 15:43
Minh141120 pushed a commit to menloresearch/llama.cpp that referenced this pull request Aug 27, 2025
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Aug 28, 2025
…nemotron-nano-15409

* origin/master: (59 commits)
scripts: add sqlite3 check for compare-commits.sh (ggml-org#15633)
kv-cache : remove LLAMA_SET_ROWS checks (ggml-org#15505)
gguf-py: byteswapping improvements (ggml-org#12851)
cli : change log to warning to explain reason for stopping (ggml-org#15604)
model-conversion : add mmproj conversion target (ggml-org#15628)
cuda: Add cublasLt_static linking when GGML_STATIC is enabled (ggml-org#15622)
server: higher timeout for tests (ggml-org#15621)
presets : add qwen3-30B-a3b FIM (ggml-org#15616)
HIP: Enable support for ggml_backend_cuda_register_host_buffer (ggml-org#15615)
kv-cache : better estimate of n_kv for multi-sequence batches (ggml-org#15610)
CANN: refactor mask handling and improve performance in FA (ggml-org#15561)
ggml-cpu : add basic RVV support for vector f32 ops (ggml-org#15057)
common : add -m to bash completion for --model [no ci] (ggml-org#15591)
OpenCL: add fused group_norm/norm, mul, add (ggml-org#15314)
tests : fix test-opt with GGML_BACKEND_DL (ggml-org#15599)
SYCL: fix rms_norm_mul_add for tensor dim not a multiple of sg_size (ggml-org#15592)
mtmd : fix mtmd ios build (ggml-org#15579)
tests: add performance test for mul mat id (ggml-org#15543)
llamafile: PowerPC Sgemm Optimization (ggml-org#15558)
graph : fix assert in memory-less build_attn (ggml-org#15590)
...
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Aug 28, 2025
…upport

* origin/master: (61 commits)
scripts: add sqlite3 check for compare-commits.sh (ggml-org#15633)
kv-cache : remove LLAMA_SET_ROWS checks (ggml-org#15505)
gguf-py: byteswapping improvements (ggml-org#12851)
cli : change log to warning to explain reason for stopping (ggml-org#15604)
model-conversion : add mmproj conversion target (ggml-org#15628)
cuda: Add cublasLt_static linking when GGML_STATIC is enabled (ggml-org#15622)
server: higher timeout for tests (ggml-org#15621)
presets : add qwen3-30B-a3b FIM (ggml-org#15616)
HIP: Enable support for ggml_backend_cuda_register_host_buffer (ggml-org#15615)
kv-cache : better estimate of n_kv for multi-sequence batches (ggml-org#15610)
CANN: refactor mask handling and improve performance in FA (ggml-org#15561)
ggml-cpu : add basic RVV support for vector f32 ops (ggml-org#15057)
common : add -m to bash completion for --model [no ci] (ggml-org#15591)
OpenCL: add fused group_norm/norm, mul, add (ggml-org#15314)
tests : fix test-opt with GGML_BACKEND_DL (ggml-org#15599)
SYCL: fix rms_norm_mul_add for tensor dim not a multiple of sg_size (ggml-org#15592)
mtmd : fix mtmd ios build (ggml-org#15579)
tests: add performance test for mul mat id (ggml-org#15543)
llamafile: PowerPC Sgemm Optimization (ggml-org#15558)
graph : fix assert in memory-less build_attn (ggml-org#15590)
...
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Aug 28, 2025
…g-model-disabled-agent-prefill

* origin/master: (76 commits)
scripts: add sqlite3 check for compare-commits.sh (ggml-org#15633)
kv-cache : remove LLAMA_SET_ROWS checks (ggml-org#15505)
gguf-py: byteswapping improvements (ggml-org#12851)
cli : change log to warning to explain reason for stopping (ggml-org#15604)
model-conversion : add mmproj conversion target (ggml-org#15628)
cuda: Add cublasLt_static linking when GGML_STATIC is enabled (ggml-org#15622)
server: higher timeout for tests (ggml-org#15621)
presets : add qwen3-30B-a3b FIM (ggml-org#15616)
HIP: Enable support for ggml_backend_cuda_register_host_buffer (ggml-org#15615)
kv-cache : better estimate of n_kv for multi-sequence batches (ggml-org#15610)
CANN: refactor mask handling and improve performance in FA (ggml-org#15561)
ggml-cpu : add basic RVV support for vector f32 ops (ggml-org#15057)
common : add -m to bash completion for --model [no ci] (ggml-org#15591)
OpenCL: add fused group_norm/norm, mul, add (ggml-org#15314)
tests : fix test-opt with GGML_BACKEND_DL (ggml-org#15599)
SYCL: fix rms_norm_mul_add for tensor dim not a multiple of sg_size (ggml-org#15592)
mtmd : fix mtmd ios build (ggml-org#15579)
tests: add performance test for mul mat id (ggml-org#15543)
llamafile: PowerPC Sgemm Optimization (ggml-org#15558)
graph : fix assert in memory-less build_attn (ggml-org#15590)
...
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Aug 28, 2025
…nemotron-nano-15409

* origin/master: (59 commits)
scripts: add sqlite3 check for compare-commits.sh (ggml-org#15633)
kv-cache : remove LLAMA_SET_ROWS checks (ggml-org#15505)
gguf-py: byteswapping improvements (ggml-org#12851)
cli : change log to warning to explain reason for stopping (ggml-org#15604)
model-conversion : add mmproj conversion target (ggml-org#15628)
cuda: Add cublasLt_static linking when GGML_STATIC is enabled (ggml-org#15622)
server: higher timeout for tests (ggml-org#15621)
presets : add qwen3-30B-a3b FIM (ggml-org#15616)
HIP: Enable support for ggml_backend_cuda_register_host_buffer (ggml-org#15615)
kv-cache : better estimate of n_kv for multi-sequence batches (ggml-org#15610)
CANN: refactor mask handling and improve performance in FA (ggml-org#15561)
ggml-cpu : add basic RVV support for vector f32 ops (ggml-org#15057)
common : add -m to bash completion for --model [no ci] (ggml-org#15591)
OpenCL: add fused group_norm/norm, mul, add (ggml-org#15314)
tests : fix test-opt with GGML_BACKEND_DL (ggml-org#15599)
SYCL: fix rms_norm_mul_add for tensor dim not a multiple of sg_size (ggml-org#15592)
mtmd : fix mtmd ios build (ggml-org#15579)
tests: add performance test for mul mat id (ggml-org#15543)
llamafile: PowerPC Sgemm Optimization (ggml-org#15558)
graph : fix assert in memory-less build_attn (ggml-org#15590)
...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants