File size: 3,108 Bytes
15e609c
 
 
 
 
 
 
 
 
 
 
8001355
875c5d5
8001355
 
15e609c
 
 
 
 
 
 
 
 
 
 
 
 
d6fce04
 
15e609c
 
 
8219ab7
15e609c
8219ab7
5de8ebe
 
85874c5
 
1834b1c
 
 
 
 
 
6f82235
 
 
 
 
 
15e609c
8812a77
15e609c
 
710ddd7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15e609c
 
 
f6ed180
15e609c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
license: apache-2.0
license_link: LICENSE
tags:
- llamafile
---

# OpenAI Whisper - llamafile

Whisperfile is a high-performance implementation of [OpenAI's
Whisper](https://github.com/openai/whisper) created by Mozilla Ocho as
part of the [llamafile](https://github.com/Mozilla-Ocho/llamafile)
project, based on the
[whisper.cpp](https://github.com/ggerganov/whisper.cpp) software written
by Georgi Gerganov, et al.

- Model creator: [OpenAI](https://huggingface.co./collections/openai/whisper-release-6501bba2cf999715fd953013)
- Original models: [openai/whisper-release](https://huggingface.co./collections/openai/whisper-release-6501bba2cf999715fd953013)
- Origin of quantized weights: [ggerganov/whisper.cpp](https://huggingface.co./ggerganov/whisper.cpp)

The model is packaged into executable weights, which we call
[whisperfiles](https://github.com/Mozilla-Ocho/llamafile/blob/0.8.13/whisper.cpp/doc/index.md).
This makes it easy to use the model on Linux, MacOS, Windows, FreeBSD,
OpenBSD, and NetBSD for AMD64 and ARM64.

## Quickstart

Running the following on a desktop OS will transcribe the speech of a
wav/mp3/ogg/flac file into text. The `-pc` flag enables confidence color
coding.

```
wget https://huggingface.co./Mozilla/whisperfile/resolve/main/whisper-tiny.en.llamafile
wget https://huggingface.co./Mozilla/whisperfile/resolve/main/raven_poe_64kb.mp3
chmod +x whisper-tiny.en.llamafile
./whisper-tiny.en.llamafile -f raven_poe_64kb.mp3 -pc
```

![screenshot](screenshot.png)

There's also an HTTP server available:

```
./whisper-tiny.en.llamafile
```

You can also read the man page:

```
./whisper-tiny.en.llamafile --help
```

Having **trouble?** See the ["Gotchas"
section](https://github.com/mozilla-ocho/llamafile/?tab=readme-ov-file#gotchas-and-troubleshooting)
of the llamafile README.

## GPU Acceleration

The following flags are available to enable GPU support:

- `--gpu nvidia`
- `--gpu metal`
- `--gpu amd`

The medium and large whisperfiles contain prebuilt dynamic shared
objects for Linux and Windows. If you download one of the other models,
then you'll need to install the CUDA or ROCm SDK and pass `--recompile`
to build a GGML CUDA module for your system.

On Windows, only the graphics card driver needs to be installed if you
own an NVIDIA GPU. On Windows, if you have an AMD GPU, you should
install the ROCm SDK v6.1 and then pass the flags `--recompile --gpu
amd` the first time you run your llamafile.

On NVIDIA GPUs, by default, the prebuilt tinyBLAS library is used to
perform matrix multiplications. This is open source software, but it
doesn't go as fast as closed source cuBLAS. If you have the CUDA SDK
installed on your system, then you can pass the `--recompile` flag to
build a GGML CUDA library just for your system that uses cuBLAS. This
ensures you get maximum performance.

For further information, please see the [llamafile
README](https://github.com/mozilla-ocho/llamafile/).

## Documentation

See the [whisperfile
documentation](https://github.com/Mozilla-Ocho/llamafile/blob/6287b60/whisper.cpp/doc/index.md)
for tutorials and further details.