-
Notifications
You must be signed in to change notification settings - Fork 4.8k
tests : add script to benchmark whisper.cpp on LibriSpeech corpus #2999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
LibriSpeech is a widely-used benchmark dataset for training and testing speech recognition models. This adds a set of scripts to measure the recognition accuracy of whisper.cpp models, following the common benchmark standards. Signed-off-by: Fujimoto Seiji <[email protected]>
@@ -0,0 +1,25 @@ | |||
Code in this directory is adapted from OpenAI Whisper project | |||
(https://github.com/openai/whisper) and carries the following | |||
copyright and license. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in LICENSE
, the normalizer implementation in the
tests/normalizer/
subfolder was ported from the upstream.
-
We need this to get a comparable WER score. See this notebook
about how OpenAI evaluate their speech recognition models. -
The reason why I commited these files to this reposjitory is to minimize the
dependencies we need to run the benchmark script.
pip install openai-whisper
requires a full PyTorch libraries, so it's heavy.
WHISPER_FLAGS = --no-prints --threads 8 --language en --output-txt | ||
``` | ||
|
||
Check out `eval.mk` for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This README
file describes how to perform the benchmark tests.
Confirmed to work on Ubuntu 24.04 and Amazon Linux 2023.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try this out locally and take a closer look as well soon.
Feedback from Daniel Bevenius. This adds a short code example how to prepare the `whisper-cli` command, to make the initial setup step a little bit clearer. Signed-off-by: Fujimoto Seiji <[email protected]>
Based on a feedback from Georgi Gerganov. Instead of setting up a virtual environment in Makefile, let users set up the Python environment. This is better since users may have their own preferred workflow/toolkit. Signed-off-by: Fujimoto Seiji <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really great!
There are 2 things that we can improve on:
-
The dataset seems to contain only relatively short speech segments. I think it would be good to have a dataset with a bit longer samples (i.e. a few minutes) in order to exercise the rolling window transcription that Whisper does
-
The current implementation loads and unloads the entire model for each sample. This is very inefficient. Instead, it should utilize the
whisper-server
to start it once and send all the samples via HTTP request. This will make the benchmark much faster.
For now we can merge and improve on these later.
@ggerganov @danbev Thank you! I'm glad that it helps this project. |
…ml-org#2999) * tests : add script to benchmark whisper.cpp on LibriSpeech corpus LibriSpeech is a widely-used benchmark dataset for training and testing speech recognition models. This adds a set of scripts to measure the recognition accuracy of whisper.cpp models, following the common benchmark standards. Signed-off-by: Fujimoto Seiji <[email protected]> * Document how to prepare `whisper-cli` and model files Feedback from Daniel Bevenius. This adds a short code example how to prepare the `whisper-cli` command, to make the initial setup step a little bit clearer. Signed-off-by: Fujimoto Seiji <[email protected]> * tests : Simplify how to set up Python environment Based on a feedback from Georgi Gerganov. Instead of setting up a virtual environment in Makefile, let users set up the Python environment. This is better since users may have their own preferred workflow/toolkit. Signed-off-by: Fujimoto Seiji <[email protected]> --------- Signed-off-by: Fujimoto Seiji <[email protected]>
* ggerganov/master: (25 commits) examples : add HEAPU8 to exported runtime methods (ggml-org#3062) ruby : make Ruby bindings installed with build options (ggml-org#3056) whisper : add no_context parameter to whisper_params (ggml-org#3045) examples : add FFmpeg v7.0 support to ffmpeg-transcode.cpp (ggml-org#3038) ruby: use CMake in build process (ggml-org#3043) docs : update README.md to note newer nvidia gpus (ggml-org#3031) addon.node : support max_context api for addon.node (ggml-org#3025) whisper : reduce delta_min from 1000ms to 100ms (ggml-org#3028) docs : document how to use 'WHISPER_FFMPEG' build option (ggml-org#3029) docs : fix README.md (ggml-org#3024) xcf : use check for visionos build version (ggml-org#3021) ruby : fix types of arguments for rb_get_kwargs in ruby_whisper_params.c (ggml-org#3022) ruby : Update uri.rb (ggml-org#3016) models : fix dead link to models in readme (ggml-org#3006) ruby : change homepage URI in Ruby gemspec (ggml-org#3007) tests : add script to benchmark whisper.cpp on LibriSpeech corpus (ggml-org#2999) whisper : fix "bench-all outputs an invalid result on larger models" (ggml-org#3002) rename : ggerganov -> ggml-org (ggml-org#3005) examples : update server.py to match github pages app [no ci] (ggml-org#3004) whisper.wasm : fix unknown language issue (ggml-org#3000) ...
* ggerganov/master: (25 commits) examples : add HEAPU8 to exported runtime methods (ggml-org#3062) ruby : make Ruby bindings installed with build options (ggml-org#3056) whisper : add no_context parameter to whisper_params (ggml-org#3045) examples : add FFmpeg v7.0 support to ffmpeg-transcode.cpp (ggml-org#3038) ruby: use CMake in build process (ggml-org#3043) docs : update README.md to note newer nvidia gpus (ggml-org#3031) addon.node : support max_context api for addon.node (ggml-org#3025) whisper : reduce delta_min from 1000ms to 100ms (ggml-org#3028) docs : document how to use 'WHISPER_FFMPEG' build option (ggml-org#3029) docs : fix README.md (ggml-org#3024) xcf : use check for visionos build version (ggml-org#3021) ruby : fix types of arguments for rb_get_kwargs in ruby_whisper_params.c (ggml-org#3022) ruby : Update uri.rb (ggml-org#3016) models : fix dead link to models in readme (ggml-org#3006) ruby : change homepage URI in Ruby gemspec (ggml-org#3007) tests : add script to benchmark whisper.cpp on LibriSpeech corpus (ggml-org#2999) whisper : fix "bench-all outputs an invalid result on larger models" (ggml-org#3002) rename : ggerganov -> ggml-org (ggml-org#3005) examples : update server.py to match github pages app [no ci] (ggml-org#3004) whisper.wasm : fix unknown language issue (ggml-org#3000) ...
LibriSpeech is a widely-used benchmark dataset for training and
testing speech recognition models.
This adds a set of scripts to measure the recognition accuracy of
whisper.cpp models, following the common benchmark standards.