tests : add script to benchmark whisper.cpp on LibriSpeech corpus #2999

fujimotos · 2025-04-03T08:58:51Z

LibriSpeech is a widely-used benchmark dataset for training and
testing speech recognition models.

For more details about LibriSpeech, see https://www.openslr.org/12

This adds a set of scripts to measure the recognition accuracy of
whisper.cpp models, following the common benchmark standards.

LibriSpeech is a widely-used benchmark dataset for training and testing speech recognition models. This adds a set of scripts to measure the recognition accuracy of whisper.cpp models, following the common benchmark standards. Signed-off-by: Fujimoto Seiji <[email protected]>

fujimotos · 2025-04-03T09:16:38Z

tests/librispeech/normalizers/LICENSE

@@ -0,0 +1,25 @@
+Code in this directory is adapted from OpenAI Whisper project
+(https://github.com/openai/whisper) and carries the following
+copyright and license.


As I mentioned in LICENSE, the normalizer implementation in the
tests/normalizer/ subfolder was ported from the upstream.

We need this to get a comparable WER score. See this notebook
about how OpenAI evaluate their speech recognition models.

The reason why I commited these files to this reposjitory is to minimize the
dependencies we need to run the benchmark script.

pip install openai-whisper requires a full PyTorch libraries, so it's heavy.

fujimotos · 2025-04-03T09:58:41Z

tests/librispeech/README.md

+WHISPER_FLAGS = --no-prints --threads 8 --language en --output-txt
+```
+
+Check out `eval.mk` for more details.


This README file describes how to perform the benchmark tests.

Confirmed to work on Ubuntu 24.04 and Amazon Linux 2023.

danbev

I'll try this out locally and take a closer look as well soon.

tests/librispeech/README.md

Feedback from Daniel Bevenius. This adds a short code example how to prepare the `whisper-cli` command, to make the initial setup step a little bit clearer. Signed-off-by: Fujimoto Seiji <[email protected]>

tests/librispeech/eval.mk

tests/librispeech/Makefile

Based on a feedback from Georgi Gerganov. Instead of setting up a virtual environment in Makefile, let users set up the Python environment. This is better since users may have their own preferred workflow/toolkit. Signed-off-by: Fujimoto Seiji <[email protected]>

ggerganov

This is really great!

There are 2 things that we can improve on:

The dataset seems to contain only relatively short speech segments. I think it would be good to have a dataset with a bit longer samples (i.e. a few minutes) in order to exercise the rolling window transcription that Whisper does
The current implementation loads and unloads the entire model for each sample. This is very inefficient. Instead, it should utilize the whisper-server to start it once and send all the samples via HTTP request. This will make the benchmark much faster.

For now we can merge and improve on these later.

tests/librispeech/README.md

fujimotos · 2025-04-05T00:42:14Z

@ggerganov @danbev Thank you! I'm glad that it helps this project.

…ml-org#2999) * tests : add script to benchmark whisper.cpp on LibriSpeech corpus LibriSpeech is a widely-used benchmark dataset for training and testing speech recognition models. This adds a set of scripts to measure the recognition accuracy of whisper.cpp models, following the common benchmark standards. Signed-off-by: Fujimoto Seiji <[email protected]> * Document how to prepare `whisper-cli` and model files Feedback from Daniel Bevenius. This adds a short code example how to prepare the `whisper-cli` command, to make the initial setup step a little bit clearer. Signed-off-by: Fujimoto Seiji <[email protected]> * tests : Simplify how to set up Python environment Based on a feedback from Georgi Gerganov. Instead of setting up a virtual environment in Makefile, let users set up the Python environment. This is better since users may have their own preferred workflow/toolkit. Signed-off-by: Fujimoto Seiji <[email protected]> --------- Signed-off-by: Fujimoto Seiji <[email protected]>

* ggerganov/master: (25 commits) examples : add HEAPU8 to exported runtime methods (ggml-org#3062) ruby : make Ruby bindings installed with build options (ggml-org#3056) whisper : add no_context parameter to whisper_params (ggml-org#3045) examples : add FFmpeg v7.0 support to ffmpeg-transcode.cpp (ggml-org#3038) ruby: use CMake in build process (ggml-org#3043) docs : update README.md to note newer nvidia gpus (ggml-org#3031) addon.node : support max_context api for addon.node (ggml-org#3025) whisper : reduce delta_min from 1000ms to 100ms (ggml-org#3028) docs : document how to use 'WHISPER_FFMPEG' build option (ggml-org#3029) docs : fix README.md (ggml-org#3024) xcf : use check for visionos build version (ggml-org#3021) ruby : fix types of arguments for rb_get_kwargs in ruby_whisper_params.c (ggml-org#3022) ruby : Update uri.rb (ggml-org#3016) models : fix dead link to models in readme (ggml-org#3006) ruby : change homepage URI in Ruby gemspec (ggml-org#3007) tests : add script to benchmark whisper.cpp on LibriSpeech corpus (ggml-org#2999) whisper : fix "bench-all outputs an invalid result on larger models" (ggml-org#3002) rename : ggerganov -> ggml-org (ggml-org#3005) examples : update server.py to match github pages app [no ci] (ggml-org#3004) whisper.wasm : fix unknown language issue (ggml-org#3000) ...

fujimotos commented Apr 3, 2025

View reviewed changes

fujimotos mentioned this pull request Apr 3, 2025

tests : add WER benchmarks #2454

Open

fujimotos commented Apr 3, 2025

View reviewed changes

danbev reviewed Apr 3, 2025

View reviewed changes

tests/librispeech/README.md Show resolved Hide resolved

Document how to prepare whisper-cli and model files

de329b7

Feedback from Daniel Bevenius. This adds a short code example how to prepare the `whisper-cli` command, to make the initial setup step a little bit clearer. Signed-off-by: Fujimoto Seiji <[email protected]>

ggerganov reviewed Apr 4, 2025

View reviewed changes

tests/librispeech/eval.mk Outdated Show resolved Hide resolved

ggerganov reviewed Apr 4, 2025

View reviewed changes

tests/librispeech/Makefile Outdated Show resolved Hide resolved

ggerganov approved these changes Apr 4, 2025

View reviewed changes

ggerganov requested a review from danbev April 4, 2025 15:24

danbev approved these changes Apr 4, 2025

View reviewed changes

tests/librispeech/README.md Show resolved Hide resolved

ggerganov merged commit 448f3d3 into ggml-org:master Apr 4, 2025
71 of 98 checks passed

fujimotos deleted the sf/librispeech branch April 5, 2025 00:39

ggerganov mentioned this pull request Apr 10, 2025

whisper : reduce delta_min from 1000ms to 100ms #3028

Merged

danbev mentioned this pull request May 12, 2025

examples : add wer cli example #2990

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tests : add script to benchmark whisper.cpp on LibriSpeech corpus #2999

tests : add script to benchmark whisper.cpp on LibriSpeech corpus #2999

Uh oh!

fujimotos commented Apr 3, 2025

Uh oh!

fujimotos Apr 3, 2025 •

edited

Loading

Uh oh!

fujimotos Apr 3, 2025

Uh oh!

danbev left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggerganov left a comment

Uh oh!

Uh oh!

Uh oh!

fujimotos commented Apr 5, 2025

Uh oh!

Uh oh!

tests : add script to benchmark whisper.cpp on LibriSpeech corpus #2999

tests : add script to benchmark whisper.cpp on LibriSpeech corpus #2999

Uh oh!

Conversation

fujimotos commented Apr 3, 2025

Uh oh!

fujimotos Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fujimotos Apr 3, 2025

Choose a reason for hiding this comment

Uh oh!

danbev left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

fujimotos commented Apr 5, 2025

Uh oh!

Uh oh!

fujimotos Apr 3, 2025 •

edited

Loading

danbev left a comment •

edited

Loading