You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The entire high-level implementation of the model is contained in [whisper.h](whisper.h) and [whisper.cpp](whisper.cpp).
39
-
The rest of the code is part of the [ggml](https://github.com/ggerganov/ggml) machine learning library.
39
+
The rest of the code is part of the [`ggml`](https://github.com/ggerganov/ggml) machine learning library.
40
40
41
41
Having such a lightweight implementation of the model allows to easily integrate it in different platforms and applications.
42
42
As an example, here is a video of running the model on an iPhone 13 device - fully offline, on-device: [whisper.objc](examples/whisper.objc)
@@ -61,22 +61,22 @@ Or you can even run it straight in the browser: [talk.wasm](examples/talk.wasm)
61
61
- Sample real-time audio transcription from the microphone is demonstrated in [stream.cpp](examples/stream)
62
62
- Various other examples are available in the [examples](examples) folder
63
63
64
-
The tensor operators are optimized heavily for Apple silicon CPUs. Depending on the computation size, Arm Neon SIMD
65
-
intrinsics or CBLAS Accelerate framework routines are used. The latter are especially effective for bigger sizes since
66
-
the Accelerate framework utilizes the special-purpose AMX coprocessor available in modern Apple products.
64
+
The tensor operators are optimized heavily for Apple silicon CPUs. Depending on the computation size, Arm Neon SIMD intrinsics or CBLAS Accelerate framework routines are used. The latter are especially effective for bigger sizes since the Accelerate framework utilizes the special-purpose AMX coprocessor available in modern Apple products.
67
65
68
66
## Quick start
69
67
70
-
First clone the repository.
68
+
First clone the repository:
71
69
72
-
Then, download one of the Whisper models converted in [ggml format](models). For example:
@@ -278,7 +278,7 @@ speed-up - more than x3 faster compared with CPU-only execution. Here are the in
278
278
279
279
- To ensure `coremltools` operates correctly, please confirm that [Xcode](https://developer.apple.com/xcode/) is installed and execute `xcode-select --install` to install the command-line tools.
280
280
- Python 3.10 is recommended.
281
-
-[OPTIONAL] It is recommended to utilize a Python version management system, such as [Miniconda](https://docs.conda.io/en/latest/miniconda.html)for this step:
281
+
-[OPTIONAL] It is recommended to utilize a Python version management system, such as [Miniconda](https://docs.conda.io/en/latest/miniconda.html) for this step:
282
282
- To create an environment, use: `conda create -n py310-whisper python=3.10 -y`
283
283
- To activate the environment, use: `conda activate py310-whisper`
284
284
@@ -304,8 +304,8 @@ speed-up - more than x3 faster compared with CPU-only execution. Here are the in
This will produce ggml-base.en-encoder-openvino.xml/.bin IR model files. It's recommended to relocate these to the same folder as ggml models, as that
361
+
This will produce ggml-base.en-encoder-openvino.xml/.bin IR model files. It's recommended to relocate these to the same folder as `ggml` models, as that
360
362
is the default location that the OpenVINO extension will search at runtime.
361
363
362
364
- Build `whisper.cpp` with OpenVINO support:
@@ -366,24 +368,28 @@ This can result in significant speedup in encoder performance. Here are the inst
366
368
After downloading & extracting package onto your development system, set up required environment by sourcing setupvars script. For example:
* Docker must be installed and running on your system.
456
-
* Create a folder to store big models & intermediate files (ex. /whisper/models)
460
+
461
+
- Docker must be installed and running on your system.
462
+
- Create a folder to store big models & intermediate files (ex. /whisper/models)
457
463
458
464
### Images
465
+
459
466
We have two Docker images available for this project:
460
467
461
468
1.`ghcr.io/ggerganov/whisper.cpp:main`: This image includes the main executable file as well as `curl` and `ffmpeg`. (platforms: `linux/amd64`, `linux/arm64`)
@@ -491,7 +498,7 @@ in about half a minute on a MacBook M1 Pro, using `medium.en` model:
Use the [extra/bench-wts.sh](https://github.com/ggerganov/whisper.cpp/blob/master/extra/bench-wts.sh) script to generate a video in the following format:
724
732
725
-
```java
733
+
```bash
726
734
./extra/bench-wts.sh samples/jfk.wav
727
735
ffplay ./samples/jfk.wav.all.mp4
728
736
```
@@ -751,8 +759,7 @@ It is written in python with the intention of being easy to modify and extend fo
751
759
752
760
It outputs a csv file with the results of the benchmarking.
753
761
754
-
755
-
## ggml format
762
+
## `ggml` format
756
763
757
764
The original models are converted to a custom binary format. This allows to pack everything needed into a single file:
758
765
@@ -767,51 +774,50 @@ or manually from here:
767
774
-https://huggingface.co/ggerganov/whisper.cpp
768
775
-https://ggml.ggerganov.com
769
776
770
-
For more details, see the conversion script [models/convert-pt-to-ggml.py](models/convert-pt-to-ggml.py) or the README
771
-
in [models](models).
777
+
For more details, see the conversion script [models/convert-pt-to-ggml.py](models/convert-pt-to-ggml.py) or [models/README.md](models/README.md).
| [whisper.android](examples/whisper.android) || Android mobile application using whisper.cpp |
810
-
| [whisper.nvim](examples/whisper.nvim) || Speech-to-text plugin for Neovim |
811
-
| [generate-karaoke.sh](examples/generate-karaoke.sh) || Helper script to easily [generate a karaoke video](https://youtu.be/uj7hVta4blM) of raw audio capture |
|[talk](examples/talk)|[talk.wasm](examples/talk.wasm)| Talk with a GPT-2 bot|
812
+
|[talk-llama](examples/talk-llama)|| Talk with a LLaMA bot|
813
+
|[whisper.objc](examples/whisper.objc)|| iOS mobile application using whisper.cpp|
814
+
|[whisper.swiftui](examples/whisper.swiftui)|| SwiftUI iOS / macOS application using whisper.cpp|
815
+
|[whisper.android](examples/whisper.android)|| Android mobile application using whisper.cpp|
816
+
|[whisper.nvim](examples/whisper.nvim)|| Speech-to-text plugin for Neovim|
817
+
|[generate-karaoke.sh](examples/generate-karaoke.sh)|| Helper script to easily [generate a karaoke video](https://youtu.be/uj7hVta4blM) of raw audio capture|
0 commit comments