Fedir Zadniprovskyi commited on
Commit
4b9d55e
·
1 Parent(s): f7ddddf

docs: init

Browse files
.pre-commit-config.yaml CHANGED
@@ -44,4 +44,4 @@ repos:
44
  rev: v1.5.0
45
  hooks:
46
  - id: detect-secrets
47
- exclude: 'README.md|tests/conftest.py'
 
44
  rev: v1.5.0
45
  hooks:
46
  - id: detect-secrets
47
+ exclude: 'README.md|tests/conftest.py|docs/usage.md'
docs/configuration.md ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!-- https://mkdocstrings.github.io/python/usage/configuration/general/ -->
2
+ ::: faster_whisper_server.config.Config
3
+ options:
4
+ show_bases: true
5
+ show_if_no_docstring: true
6
+ show_labels: false
7
+ separate_signature: true
8
+ show_signature_annotations: true
9
+ signature_crossrefs: true
10
+ summary: false
11
+ source: true
12
+ members_order: source
13
+ filters:
14
+ - "!model_config"
15
+ - "!chat_completion_*"
16
+ - "!speech_*"
17
+ - "!transcription_*"
18
+
19
+ ::: faster_whisper_server.config.WhisperConfig
20
+
21
+ <!-- TODO: nested model `whisper` -->
22
+ <!-- TODO: Insert new lines for multi-line docstrings -->
docs/index.md DELETED
@@ -1 +0,0 @@
1
- Coming soon...
 
 
docs/installation.md ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Docker Compose (Recommended)
2
+
3
+ TODO: just reference the existing compose file in the repo
4
+ === "CUDA"
5
+
6
+ ```yaml
7
+ # https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
8
+ services:
9
+ faster-whisper-server:
10
+ image: fedirz/faster-whisper-server:latest-cuda
11
+ name: faster-whisper-server
12
+ restart: unless-stopped
13
+ ports:
14
+ - 8000:8000
15
+ volumes:
16
+ - hugging_face_cache:/root/.cache/huggingface
17
+ deploy:
18
+ resources:
19
+ reservations:
20
+ devices:
21
+ - capabilities: ["gpu"]
22
+ volumes:
23
+ hugging_face_cache:
24
+ ```
25
+
26
+ === "CUDA (with CDI feature enabled)"
27
+
28
+ ```yaml
29
+ # https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
30
+ services:
31
+ faster-whisper-server:
32
+ image: fedirz/faster-whisper-server:latest-cuda
33
+ name: faster-whisper-server
34
+ restart: unless-stopped
35
+ ports:
36
+ - 8000:8000
37
+ volumes:
38
+ - hugging_face_cache:/root/.cache/huggingface
39
+ deploy:
40
+ resources:
41
+ reservations:
42
+ # https://docs.docker.com/reference/cli/dockerd/#enable-cdi-devices
43
+ # https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html
44
+ devices:
45
+ - driver: cdi
46
+ device_ids:
47
+ - nvidia.com/gpu=all
48
+ volumes:
49
+ hugging_face_cache:
50
+ ```
51
+
52
+ === "CPU"
53
+
54
+ ```yaml
55
+ services:
56
+ faster-whisper-server:
57
+ image: fedirz/faster-whisper-server:latest-cpu
58
+ name: faster-whisper-server
59
+ restart: unless-stopped
60
+ ports:
61
+ - 8000:8000
62
+ volumes:
63
+ - hugging_face_cache:/root/.cache/huggingface
64
+ volumes:
65
+ hugging_face_cache:
66
+ ```
67
+
68
+ ## Docker
69
+
70
+ === "CUDA"
71
+
72
+ ```bash
73
+ docker run --rm --detach --publish 8000:8000 --name faster-whisper-server --volume hugging_face_cache:/root/.cache/huggingface --gpus=all fedirz/faster-whisper-server:latest-cuda
74
+ ```
75
+
76
+ === "CUDA (with CDI feature enabled)"
77
+
78
+ ```bash
79
+ docker run --rm --detach --publish 8000:8000 --name faster-whisper-server --volume hugging_face_cache:/root/.cache/huggingface --device=nvidia.com/gpu=all fedirz/faster-whisper-server:latest-cuda
80
+ ```
81
+
82
+ === "CPU"
83
+
84
+ ```bash
85
+ docker run --rm --detach --publish 8000:8000 --name faster-whisper-server --volume hugging_face_cache:/root/.cache/huggingface fedirz/faster-whisper-server:latest-cpu
86
+ ```
87
+
88
+ ## Kubernetes
89
+ WARNING: it was written few months ago and may be outdated.
90
+ Please refer to this [blog post](https://substratus.ai/blog/deploying-faster-whisper-on-k8s)
91
+
92
+ ## Python (requires Python 3.12+)
93
+
94
+ ```bash
95
+ git clone https://github.com/fedirz/faster-whisper-server.git
96
+ cd faster-whisper-server
97
+ uv venv
98
+ sourve .venv/bin/activate
99
+ uv sync --all-extras
100
+ uvicorn --factory --host 0.0.0.0 faster_whisper_server.main:create_app
101
+ ```
docs/introduction.md ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ !!! warning
2
+
3
+ Under development. I don't yet recommend using these docs as reference for now.
4
+
5
+ # Faster Whisper Server
6
+
7
+ `faster-whisper-server` is an OpenAI API-compatible transcription server which uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) as its backend.
8
+ Features:
9
+
10
+ - GPU and CPU support.
11
+ - Easily deployable using Docker.
12
+ - **Configurable through environment variables (see [config.py](./src/faster_whisper_server/config.py))**.
13
+ - OpenAI API compatible.
14
+ - Streaming support (transcription is sent via [SSE](https://en.wikipedia.org/wiki/Server-sent_events) as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
15
+ - Live transcription support (audio is sent via websocket as it's generated).
16
+ - Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity.
17
+
18
+ Please create an issue if you find a bug, have a question, or a feature suggestion.
19
+
20
+ ## OpenAI API Compatibility ++
21
+
22
+ See [OpenAI API reference](https://platform.openai.com/docs/api-reference/audio) for more information.
23
+
24
+ - Audio file transcription via `POST /v1/audio/transcriptions` endpoint.
25
+ - Unlike OpenAI's API, `faster-whisper-server` also supports streaming transcriptions (and translations). This is useful for when you want to process large audio files and would rather receive the transcription in chunks as they are processed, rather than waiting for the whole file to be transcribed. It works similarly to chat messages when chatting with LLMs.
26
+ - Audio file translation via `POST /v1/audio/translations` endpoint.
27
+ - Live audio transcription via `WS /v1/audio/transcriptions` endpoint.
28
+ - LocalAgreement2 ([paper](https://aclanthology.org/2023.ijcnlp-demo.3.pdf) | [original implementation](https://github.com/ufal/whisper_streaming)) algorithm is used for live transcription.
29
+ - Only transcription of a single channel, 16000 sample rate, raw, 16-bit little-endian audio is supported.
30
+
31
+ TODO: add a note about gradio ui
32
+ TODO: add a note about hf space
docs/usage.md ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ TODO: break this down into: transcription/translation, streaming transcription/translation, live transcription, audio generation, model listing
2
+ TODO: add video demos for all
3
+ TODO: add a note about OPENAI_API_KEY
4
+
5
+ ## Curl
6
+
7
+ ```bash
8
+ curl http://localhost:8000/v1/audio/transcriptions -F "[email protected]"
9
+ ```
10
+
11
+ ## Python
12
+
13
+ === "httpx"
14
+
15
+ ```python
16
+ import httpx
17
+
18
+ with open('audio.wav', 'rb') as f:
19
+ files = {'file': ('audio.wav', f)}
20
+ response = httpx.post('http://localhost:8000/v1/audio/transcriptions', files=files)
21
+
22
+ print(response.text)
23
+ ```
24
+
25
+ ## OpenAI SDKs
26
+
27
+ === "Python"
28
+
29
+ ```python
30
+ import httpx
31
+
32
+ with open('audio.wav', 'rb') as f:
33
+ files = {'file': ('audio.wav', f)}
34
+ response = httpx.post('http://localhost:8000/v1/audio/transcriptions', files=files)
35
+
36
+ print(response.text)
37
+ ```
38
+
39
+ === "CLI"
40
+
41
+ ```bash
42
+ export OPENAI_BASE_URL=http://localhost:8000/v1/
43
+ export OPENAI_API_KEY="cant-be-empty"
44
+ openai api audio.transcriptions.create -m Systran/faster-whisper-small -f audio.wav --response-format text
45
+ ```
46
+
47
+ === "Other"
48
+
49
+ See [OpenAI libraries](https://platform.openai.com/docs/libraries) and [OpenAI speech-to-text usage](https://platform.openai.com/docs/guides/speech-to-text).
50
+
51
+ ## Open WebUI
52
+
53
+ ### Using the UI
54
+
55
+ 1. Go to the [Admin Settings](http://localhost:8080/admin/settings) page
56
+ 2. Click on the "Audio" tab
57
+ 3. Update settings
58
+ - Speech-to-Text Engine: OpenAI
59
+ - API Base URL: http://faster-whisper-server:8000/v1
60
+ - API Key: does-not-matter-what-you-put-but-should-not-be-empty
61
+ - Model: Systran/faster-distil-whisper-large-v3
62
+ 4. Click "Save"
63
+
64
+ ### Using environment variables (Docker Compose)
65
+
66
+ !!! warning
67
+
68
+ This doesn't seem to work when you've previously used the UI to set the STT engine.
69
+
70
+ ```yaml
71
+ # NOTE: Some parts of the file are omitted for brevity.
72
+ services:
73
+ open-webui:
74
+ image: ghcr.io/open-webui/open-webui:main
75
+ ...
76
+ environment:
77
+ ...
78
+ # Environment variables are documented here https://docs.openwebui.com/getting-started/env-configuration#speech-to-text
79
+ AUDIO_STT_ENGINE: "openai"
80
+ AUDIO_STT_OPENAI_API_BASE_URL: "http://faster-whisper-server:8000/v1"
81
+ AUDIO_STT_OPENAI_API_KEY: "does-not-matter-what-you-put-but-should-not-be-empty"
82
+ AUDIO_STT_MODEL: "Systran/faster-distil-whisper-large-v3"
83
+ faster-whisper-server:
84
+ image: fedirz/faster-whisper-server:latest-cuda
85
+ ...
86
+ ```
mkdocs.yml CHANGED
@@ -20,7 +20,10 @@ plugins:
20
  - mkdocstrings:
21
  default_handler: python
22
  nav:
23
- - Home: index.md
 
 
 
24
  markdown_extensions:
25
  - admonition
26
  - pymdownx.superfences
 
20
  - mkdocstrings:
21
  default_handler: python
22
  nav:
23
+ - Introduction: introduction.md
24
+ - Installation: installation.md
25
+ - Configuration: configuration.md
26
+ - Usage: usage.md
27
  markdown_extensions:
28
  - admonition
29
  - pymdownx.superfences
src/faster_whisper_server/config.py CHANGED
@@ -38,6 +38,7 @@ class Quantization(enum.StrEnum):
38
  DEFAULT = "default"
39
 
40
 
 
41
  class Language(enum.StrEnum):
42
  AF = "af"
43
  AM = "am"
@@ -151,7 +152,7 @@ class WhisperConfig(BaseModel):
151
 
152
  model: str = Field(default="Systran/faster-whisper-small")
153
  """
154
- Default Huggingface model to use for transcription. Note, the model must support being ran using CTranslate2.
155
  This model will be used if no model is specified in the request.
156
 
157
  Models created by authors of `faster-whisper` can be found at https://huggingface.co/Systran
@@ -174,6 +175,7 @@ class WhisperConfig(BaseModel):
174
  """ # noqa: E501
175
 
176
 
 
177
  class Config(BaseSettings):
178
  """Configuration for the application. Values can be set via environment variables.
179
 
@@ -185,7 +187,13 @@ class Config(BaseSettings):
185
  model_config = SettingsConfigDict(env_nested_delimiter="__")
186
 
187
  api_key: str | None = None
 
 
 
188
  log_level: str = "debug"
 
 
 
189
  host: str = Field(alias="UVICORN_HOST", default="0.0.0.0")
190
  port: int = Field(alias="UVICORN_PORT", default=8000)
191
  allow_origins: list[str] | None = None
@@ -198,8 +206,8 @@ class Config(BaseSettings):
198
 
199
  enable_ui: bool = True
200
  """
201
- Whether to enable the Gradio UI. You may want to disable this if you want to minimize the dependencies.
202
- """
203
 
204
  default_language: Language | None = None
205
  """
@@ -216,26 +224,35 @@ class Config(BaseSettings):
216
  ],
217
  )
218
  """
219
- List of models to preload on startup. By default, the model is first loaded on first request.
 
220
  """
221
  max_no_data_seconds: float = 1.0
222
  """
223
  Max duration to wait for the next audio chunk before transcription is finilized and connection is closed.
 
224
  """
225
  min_duration: float = 1.0
226
  """
227
  Minimum duration of an audio chunk that will be transcribed.
 
228
  """
229
  word_timestamp_error_margin: float = 0.2
 
 
 
230
  max_inactivity_seconds: float = 2.5
231
  """
232
  Max allowed audio duration without any speech being detected before transcription is finilized and connection is closed.
 
233
  """ # noqa: E501
234
  inactivity_window_seconds: float = 5.0
235
  """
236
- Controls how many latest seconds of audio are being passed through VAD.
237
- Should be greater than `max_inactivity_seconds`
238
- """
 
 
239
 
240
  chat_completion_base_url: str = "https://api.openai.com/v1"
241
  chat_completion_api_key: str | None = None
 
38
  DEFAULT = "default"
39
 
40
 
41
+ # TODO: this needs to be rethought
42
  class Language(enum.StrEnum):
43
  AF = "af"
44
  AM = "am"
 
152
 
153
  model: str = Field(default="Systran/faster-whisper-small")
154
  """
155
+ Default HuggingFace model to use for transcription. Note, the model must support being ran using CTranslate2.
156
  This model will be used if no model is specified in the request.
157
 
158
  Models created by authors of `faster-whisper` can be found at https://huggingface.co/Systran
 
175
  """ # noqa: E501
176
 
177
 
178
+ # TODO: document `alias` behaviour within the docstring
179
  class Config(BaseSettings):
180
  """Configuration for the application. Values can be set via environment variables.
181
 
 
187
  model_config = SettingsConfigDict(env_nested_delimiter="__")
188
 
189
  api_key: str | None = None
190
+ """
191
+ If set, the API key will be required for all requests.
192
+ """
193
  log_level: str = "debug"
194
+ """
195
+ Logging level. One of: 'debug', 'info', 'warning', 'error', 'critical'.
196
+ """
197
  host: str = Field(alias="UVICORN_HOST", default="0.0.0.0")
198
  port: int = Field(alias="UVICORN_PORT", default=8000)
199
  allow_origins: list[str] | None = None
 
206
 
207
  enable_ui: bool = True
208
  """
209
+ Whether to enable the Gradio UI. You may want to disable this if you want to minimize the dependencies and slightly improve the startup time.
210
+ """ # noqa: E501
211
 
212
  default_language: Language | None = None
213
  """
 
224
  ],
225
  )
226
  """
227
+ List of Whisper models to preload on startup. By default, the model is first loaded on first request.
228
+ WARNING: I'd recommend not setting this, as it may be deprecated in the future.
229
  """
230
  max_no_data_seconds: float = 1.0
231
  """
232
  Max duration to wait for the next audio chunk before transcription is finilized and connection is closed.
233
+ Used only for live transcription (WS /v1/audio/transcriptions).
234
  """
235
  min_duration: float = 1.0
236
  """
237
  Minimum duration of an audio chunk that will be transcribed.
238
+ Used only for live transcription (WS /v1/audio/transcriptions).
239
  """
240
  word_timestamp_error_margin: float = 0.2
241
+ """
242
+ Used only for live transcription (WS /v1/audio/transcriptions).
243
+ """
244
  max_inactivity_seconds: float = 2.5
245
  """
246
  Max allowed audio duration without any speech being detected before transcription is finilized and connection is closed.
247
+ Used only for live transcription (WS /v1/audio/transcriptions).
248
  """ # noqa: E501
249
  inactivity_window_seconds: float = 5.0
250
  """
251
+ Controls how many latest seconds of audio are being passed through VAD. Should be greater than `max_inactivity_seconds`.
252
+ Used only for live transcription (WS /v1/audio/transcriptions).
253
+ """ # noqa: E501
254
+
255
+ # NOTE: options below are not used yet and should be ignored. Added as a placeholder for future features I'm currently working on. # noqa: E501
256
 
257
  chat_completion_base_url: str = "https://api.openai.com/v1"
258
  chat_completion_api_key: str | None = None