File size: 5,510 Bytes
f043430
40dcc49
a6fe46a
f043430
40dcc49
f043430
 
1500d25
f043430
a6fe46a
 
ddc67e6
f043430
313814b
f043430
 
40dcc49
f043430
40dcc49
f043430
40dcc49
f043430
40dcc49
 
 
f043430
 
40dcc49
18e03be
01eb1bb
 
 
40dcc49
 
8fc9285
 
313814b
8fc9285
 
 
 
 
 
 
 
313814b
8fc9285
40dcc49
 
39ee116
8fc9285
 
 
 
39ee116
49f71ac
40dcc49
 
 
49f71ac
39ee116
40dcc49
9bbc276
e2a7610
f043430
40dcc49
313814b
f043430
 
 
40dcc49
f043430
c8f37a4
f043430
c8f37a4
f043430
40dcc49
f043430
40dcc49
f043430
 
 
 
 
 
 
c8f37a4
f043430
 
 
 
a6fe46a
40dcc49
f043430
 
 
 
47627a9
c8f37a4
f043430
c8f37a4
f043430
 
 
 
5783d42
40dcc49
29d246f
 
 
 
f043430
5783d42
40dcc49
f043430
 
313814b
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
# Faster Whisper Server

`faster-whisper-server` is an OpenAI API-compatible transcription server which uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) as its backend.
Features:

- GPU and CPU support.
- Easily deployable using Docker.
- **Configurable through environment variables (see [config.py](./src/faster_whisper_server/config.py))**.
- OpenAI API compatible.
- Streaming support (transcription is sent via [SSE](https://en.wikipedia.org/wiki/Server-sent_events) as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
- Live transcription support (audio is sent via websocket as it's generated).
- Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity.

Please create an issue if you find a bug, have a question, or a feature suggestion.

## OpenAI API Compatibility ++

See [OpenAI API reference](https://platform.openai.com/docs/api-reference/audio) for more information.

- Audio file transcription via `POST /v1/audio/transcriptions` endpoint.
  - Unlike OpenAI's API, `faster-whisper-server` also supports streaming transcriptions (and translations). This is useful for when you want to process large audio files and would rather receive the transcription in chunks as they are processed, rather than waiting for the whole file to be transcribed. It works similarly to chat messages when chatting with LLMs.
- Audio file translation via `POST /v1/audio/translations` endpoint.
- Live audio transcription via `WS /v1/audio/transcriptions` endpoint.
  - LocalAgreement2 ([paper](https://aclanthology.org/2023.ijcnlp-demo.3.pdf) | [original implementation](https://github.com/ufal/whisper_streaming)) algorithm is used for live transcription.
  - Only transcription of a single channel, 16000 sample rate, raw, 16-bit little-endian audio is supported.

## Quick Start

[Hugging Face Space](https://huggingface.co/spaces/fedirz/faster-whisper-server)

![image](https://github.com/fedirz/faster-whisper-server/assets/76551385/6d215c52-ded5-41d2-89a5-03a6fd113aa0)

### Using Docker Compose (Recommended)

NOTE: I'm using newer Docker Compsose features. If you are using an older version of Docker Compose, you may need need to update.

```bash
curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.yaml

# for GPU support
curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.cuda.yaml
docker compose --file compose.cuda.yaml up --detach
# for CPU only (use this if you don't have a GPU, as the image is much smaller)
curl --silent --remote-name https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.cpu.yaml
docker compose --file compose.cpu.yaml up --detach
```

### Using Docker

```bash
# for GPU support
docker run --gpus=all --publish 8000:8000 --volume ~/.cache/huggingface:/root/.cache/huggingface --detach fedirz/faster-whisper-server:latest-cuda
# for CPU only (use this if you don't have a GPU, as the image is much smaller)
docker run --publish 8000:8000 --volume ~/.cache/huggingface:/root/.cache/huggingface --env WHISPER__MODEL=Systran/faster-whisper-small --detach fedirz/faster-whisper-server:latest-cpu
```

### Using Kubernetes

Follow [this tutorial](https://substratus.ai/blog/deploying-faster-whisper-on-k8s)

## Usage

If you are looking for a step-by-step walkthrough, check out [this](https://www.youtube.com/watch?app=desktop&v=vSN-oAl6LVs) YouTube video.

### OpenAI API CLI

```bash
export OPENAI_API_KEY="cant-be-empty"
export OPENAI_BASE_URL=http://localhost:8000/v1/
```

```bash
openai api audio.transcriptions.create -m Systran/faster-distil-whisper-large-v3 -f audio.wav --response-format text

openai api audio.translations.create -m Systran/faster-distil-whisper-large-v3 -f audio.wav --response-format verbose_json
```

### OpenAI API Python SDK

```python
from openai import OpenAI

client = OpenAI(api_key="cant-be-empty", base_url="http://localhost:8000/v1/")

audio_file = open("audio.wav", "rb")
transcript = client.audio.transcriptions.create(
    model="Systran/faster-distil-whisper-large-v3", file=audio_file
)
print(transcript.text)
```

### cURL

```bash
# If `model` isn't specified, the default model is used
curl http://localhost:8000/v1/audio/transcriptions -F "[email protected]"
curl http://localhost:8000/v1/audio/transcriptions -F "[email protected]"
curl http://localhost:8000/v1/audio/transcriptions -F "[email protected]" -F "stream=true"
curl http://localhost:8000/v1/audio/transcriptions -F "[email protected]" -F "model=Systran/faster-distil-whisper-large-v3"
# It's recommended that you always specify the language as that will reduce the transcription time
curl http://localhost:8000/v1/audio/transcriptions -F "[email protected]" -F "language=en"

curl http://localhost:8000/v1/audio/translations -F "[email protected]"
```

### Live Transcription (using WebSocket)

From [live-audio](./examples/live-audio) example

https://github.com/fedirz/faster-whisper-server/assets/76551385/e334c124-af61-41d4-839c-874be150598f

[websocat](https://github.com/vi/websocat?tab=readme-ov-file#installation) installation is required.
Live transcription of audio data from a microphone.

```bash
ffmpeg -loglevel quiet -f alsa -i default -ac 1 -ar 16000 -f s16le - | websocat --binary ws://localhost:8000/v1/audio/transcriptions
```