Keeping model lists in sync, clarify `make` behavior (docs)

While working on #1806, I came across the following:

1. The list of models in `./models/README.md` is currently a subset of the list of models in `./models/download-ggml-model.sh`, which is itself a subset of what is available from https://huggingface.co/ggerganov/whisper.cpp/tree/main (bulk of options), https://ggml.ggerganov.com/ (duplicates of first only? Didn't check carefully), and https://huggingface.co/akashmjn/tinydiarize-whisper.cpp/tree/main (just one). Is there some logic to as to which models are included / not included? I was tempted to add more to both lists, but wanted to ask first about inclusion criteria.
2. In [./README.md#more-audio-samples](https://github.com/ggerganov/whisper.cpp/blob/master/README.md#more-audio-samples), there's a list of `make` commands starting with `make tiny.en`, perhaps this used to be in a different section higher up? Are they still current / helpful? Oh, huh, I probably should've also changed this bit (in the "Quick start" section): 
    > Now build the [main](https://github.com/ggerganov/whisper.cpp/blob/master/examples/main) example and transcribe an audio file like this:
    > 
    > ```bash
    > # build the main example
    > make
    > 
    > # transcribe an audio file
    > ./main -f samples/jfk.wav

    ... to clarify that `make` alone defaults to the `base` model, and also downloads the model if necessary (mentioned lower down: "The command downloads the `base.en` model converted to custom `ggml` format and runs the inference on all `.wav` samples in the folder `samples`."), meaning there's no need to run `bash ./models/download-ggml-model.sh base.en` if you just want to try the `base` model. Maybe I'll take another pass at improving the "Quick start" section after [#1806](https://github.com/ggerganov/whisper.cpp/pull/1806) merges (or is closed). For someone who wants to just dive in, seems a little confusing to have basic useful info about `make` after a bunch of console output.
4. The list showing memory usage ([./README.md#memory-usage](https://github.com/ggerganov/whisper.cpp/blob/master/README.md#memory-usage)) only shows 5 models, should there be an issue about bringing this up-to-date? Or at least adding `-v1` to `large`, if that's the case?
5. The list of models in `./models/README.md` gives SHA-1 file hashes, which is nice because they aren't long (keeps the table tidy), but also a smidge inconvenient, since Hugging Face provides SHA-256 files hashes (e.g. for [large-v3](https://huggingface.co/ggerganov/whisper.cpp/blob/main/ggml-large-v3.bin)). I suppose this could be considered a bonus, as one could -- after downloading a model -- compute locally & compare both values from both sources (`./models/README.md` & Hugging Face) for addition peace of mind / security.
6. `./Makefile` doesn't seem to support all models, only those it is explicitly aware of, e.g. it doesn't know about `large-v3-q5_0` or `small.en-tdrz`, so it fails with `make: *** No rule to make target `large-v3-q5_0'.  Stop.` So the list of models in `./Makefile` should be keep somewhat in sync with e.g. `./models/download-ggml-model.sh` (see point 1)? Maybe a single source of truth would be nice?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Keeping model lists in sync, clarify `make` behavior (docs) #1807

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Keeping model lists in sync, clarify make behavior (docs) #1807

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Keeping model lists in sync, clarify `make` behavior (docs) #1807