-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Description
While working on #1806, I came across the following:
-
The list of models in
./models/README.md
is currently a subset of the list of models in./models/download-ggml-model.sh
, which is itself a subset of what is available from https://huggingface.co/ggerganov/whisper.cpp/tree/main (bulk of options), https://ggml.ggerganov.com/ (duplicates of first only? Didn't check carefully), and https://huggingface.co/akashmjn/tinydiarize-whisper.cpp/tree/main (just one). Is there some logic to as to which models are included / not included? I was tempted to add more to both lists, but wanted to ask first about inclusion criteria. -
In ./README.md#more-audio-samples, there's a list of
make
commands starting withmake tiny.en
, perhaps this used to be in a different section higher up? Are they still current / helpful? Oh, huh, I probably should've also changed this bit (in the "Quick start" section):Now build the main example and transcribe an audio file like this:
# build the main example make # transcribe an audio file ./main -f samples/jfk.wav
... to clarify that
make
alone defaults to thebase
model, and also downloads the model if necessary (mentioned lower down: "The command downloads thebase.en
model converted to customggml
format and runs the inference on all.wav
samples in the foldersamples
."), meaning there's no need to runbash ./models/download-ggml-model.sh base.en
if you just want to try thebase
model. Maybe I'll take another pass at improving the "Quick start" section after #1806 merges (or is closed). For someone who wants to just dive in, seems a little confusing to have basic useful info aboutmake
after a bunch of console output. -
The list showing memory usage (./README.md#memory-usage) only shows 5 models, should there be an issue about bringing this up-to-date? Or at least adding
-v1
tolarge
, if that's the case? -
The list of models in
./models/README.md
gives SHA-1 file hashes, which is nice because they aren't long (keeps the table tidy), but also a smidge inconvenient, since Hugging Face provides SHA-256 files hashes (e.g. for large-v3). I suppose this could be considered a bonus, as one could -- after downloading a model -- compute locally & compare both values from both sources (./models/README.md
& Hugging Face) for addition peace of mind / security. -
./Makefile
doesn't seem to support all models, only those it is explicitly aware of, e.g. it doesn't know aboutlarge-v3-q5_0
orsmall.en-tdrz
, so it fails withmake: *** No rule to make target
large-v3-q5_0'. Stop.So the list of models in
./Makefileshould be keep somewhat in sync with e.g.
./models/download-ggml-model.sh` (see point 1)? Maybe a single source of truth would be nice?