Fedir Zadniprovskyi commited on
Commit
b56d19a
·
1 Parent(s): 73e7d80

docs: usage pages (and more)

Browse files
.pre-commit-config.yaml CHANGED
@@ -44,4 +44,4 @@ repos:
44
  rev: v1.5.0
45
  hooks:
46
  - id: detect-secrets
47
- exclude: 'README.md|tests/conftest.py|docs/usage.md'
 
44
  rev: v1.5.0
45
  hooks:
46
  - id: detect-secrets
47
+ exclude: 'README.md|tests/conftest.py|docs/usage/*'
README.md CHANGED
@@ -78,15 +78,15 @@ openai api audio.translations.create -m Systran/faster-distil-whisper-large-v3 -
78
  ### OpenAI API Python SDK
79
 
80
  ```python
 
 
81
  from openai import OpenAI
82
 
83
- client = OpenAI(api_key="cant-be-empty", base_url="http://localhost:8000/v1/")
84
 
85
- audio_file = open("audio.wav", "rb")
86
- transcript = client.audio.transcriptions.create(
87
- model="Systran/faster-distil-whisper-large-v3", file=audio_file
88
- )
89
- print(transcript.text)
90
  ```
91
 
92
  ### cURL
 
78
  ### OpenAI API Python SDK
79
 
80
  ```python
81
+ from pathlib import Path
82
+
83
  from openai import OpenAI
84
 
85
+ client = OpenAI(base_url="http://localhost:8000/v1", api_key="cant-be-empty")
86
 
87
+ with Path("audio.wav").open("rb") as f:
88
+ transcript = client.audio.transcriptions.create(model="Systran/faster-distil-whisper-large-v3", file=f)
89
+ print(transcript.text)
 
 
90
  ```
91
 
92
  ### cURL
docs/introduction.md CHANGED
@@ -2,18 +2,26 @@
2
 
3
  Under development. I don't yet recommend using these docs as reference for now.
4
 
 
 
5
  # Faster Whisper Server
6
 
7
- `faster-whisper-server` is an OpenAI API-compatible transcription server which uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) as its backend.
8
- Features:
 
9
 
10
  - GPU and CPU support.
11
- - Easily deployable using Docker.
12
- - **Configurable through environment variables (see [config.py](./src/faster_whisper_server/config.py))**.
13
- - OpenAI API compatible.
14
  - Streaming support (transcription is sent via [SSE](https://en.wikipedia.org/wiki/Server-sent_events) as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
15
  - Live transcription support (audio is sent via websocket as it's generated).
16
  - Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity.
 
 
 
 
 
17
 
18
  Please create an issue if you find a bug, have a question, or a feature suggestion.
19
 
 
2
 
3
  Under development. I don't yet recommend using these docs as reference for now.
4
 
5
+ TODO: add HuggingFace Space URL
6
+
7
  # Faster Whisper Server
8
 
9
+ `faster-whisper-server` is an OpenAI API-compatible server supporting transcription, translation, and speech generation. For transcription/translation it uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) and for text-to-speech [piper](https://github.com/rhasspy/piper) is used.
10
+
11
+ ## Features:
12
 
13
  - GPU and CPU support.
14
+ - [Deployable via Docker Compose / Docker](./installation.md)
15
+ - [Highly configurable](./configuration.md)
16
+ - OpenAI API compatible. All tools and SDKs that work with OpenAI's API should work with `faster-whisper-server`.
17
  - Streaming support (transcription is sent via [SSE](https://en.wikipedia.org/wiki/Server-sent_events) as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
18
  - Live transcription support (audio is sent via websocket as it's generated).
19
  - Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity.
20
+ - (Coming soon) Audio generation (chat completions endpoint) | [OpenAI Documentation](https://platform.openai.com/docs/guides/realtime)
21
+ - Generate a spoken audio summary of a body of text (text in, audio out)
22
+ - Perform sentiment analysis on a recording (audio in, text out)
23
+ - Async speech to speech interactions with a model (audio in, audio out)
24
+ - (Coming soon) Realtime API | [OpenAI Documentation](https://platform.openai.com/docs/guides/realtime)
25
 
26
  Please create an issue if you find a bug, have a question, or a feature suggestion.
27
 
docs/usage/live-transcription.md ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Live Transcription (using WebSocket)
2
+
3
+ !!! note
4
+
5
+ More content will be added here soon.
6
+
7
+ TODO: fix link
8
+ From [live-audio](./examples/live-audio) example
9
+
10
+ https://github.com/fedirz/faster-whisper-server/assets/76551385/e334c124-af61-41d4-839c-874be150598f
11
+
12
+ [websocat](https://github.com/vi/websocat?tab=readme-ov-file#installation) installation is required.
13
+ Live transcription of audio data from a microphone.
14
+
15
+ ```bash
16
+ ffmpeg -loglevel quiet -f alsa -i default -ac 1 -ar 16000 -f s16le - | websocat --binary ws://localhost:8000/v1/audio/transcriptions
17
+ ```
docs/{usage.md → usage/open-webui-integration.md} RENAMED
@@ -1,53 +1,3 @@
1
- TODO: break this down into: transcription/translation, streaming transcription/translation, live transcription, audio generation, model listing
2
- TODO: add video demos for all
3
- TODO: add a note about OPENAI_API_KEY
4
-
5
- ## Curl
6
-
7
- ```bash
8
- curl http://localhost:8000/v1/audio/transcriptions -F "[email protected]"
9
- ```
10
-
11
- ## Python
12
-
13
- === "httpx"
14
-
15
- ```python
16
- import httpx
17
-
18
- with open('audio.wav', 'rb') as f:
19
- files = {'file': ('audio.wav', f)}
20
- response = httpx.post('http://localhost:8000/v1/audio/transcriptions', files=files)
21
-
22
- print(response.text)
23
- ```
24
-
25
- ## OpenAI SDKs
26
-
27
- === "Python"
28
-
29
- ```python
30
- import httpx
31
-
32
- with open('audio.wav', 'rb') as f:
33
- files = {'file': ('audio.wav', f)}
34
- response = httpx.post('http://localhost:8000/v1/audio/transcriptions', files=files)
35
-
36
- print(response.text)
37
- ```
38
-
39
- === "CLI"
40
-
41
- ```bash
42
- export OPENAI_BASE_URL=http://localhost:8000/v1/
43
- export OPENAI_API_KEY="cant-be-empty"
44
- openai api audio.transcriptions.create -m Systran/faster-whisper-small -f audio.wav --response-format text
45
- ```
46
-
47
- === "Other"
48
-
49
- See [OpenAI libraries](https://platform.openai.com/docs/libraries) and [OpenAI speech-to-text usage](https://platform.openai.com/docs/guides/speech-to-text).
50
-
51
  ## Open WebUI
52
 
53
  ### Using the UI
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ## Open WebUI
2
 
3
  ### Using the UI
docs/usage/speech-to-text.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ https://platform.openai.com/docs/api-reference/audio/createTranscription
2
+ https://platform.openai.com/docs/guides/speech-to-text
3
+
4
+ TODO: add a note about automatic downloads
5
+ TODO: add a note about api-key
6
+ TODO: mention streaming
7
+ TODO: add a demo
8
+ TODO: talk about audio format
9
+
10
+ ## Curl
11
+
12
+ ```bash
13
+ curl http://localhost:8000/v1/audio/transcriptions -F "[email protected]"
14
+ ```
15
+
16
+ ## Python
17
+
18
+ === "httpx"
19
+
20
+ ```python
21
+ import httpx
22
+
23
+ with open('audio.wav', 'rb') as f:
24
+ files = {'file': ('audio.wav', f)}
25
+ response = httpx.post('http://localhost:8000/v1/audio/transcriptions', files=files)
26
+
27
+ print(response.text)
28
+ ```
29
+
30
+ ## OpenAI SDKs
31
+
32
+ === "Python"
33
+
34
+ ```python
35
+ import httpx
36
+
37
+ with open('audio.wav', 'rb') as f:
38
+ files = {'file': ('audio.wav', f)}
39
+ response = httpx.post('http://localhost:8000/v1/audio/transcriptions', files=files)
40
+
41
+ print(response.text)
42
+ ```
43
+
44
+ === "CLI"
45
+
46
+ ```bash
47
+ export OPENAI_BASE_URL=http://localhost:8000/v1/
48
+ export OPENAI_API_KEY="cant-be-empty"
49
+ openai api audio.transcriptions.create -m Systran/faster-whisper-small -f audio.wav --response-format text
50
+ ```
51
+
52
+ === "Other"
53
+
54
+ See [OpenAI libraries](https://platform.openai.com/docs/libraries).
docs/usage/text-to-speech.md ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ !!! warning
2
+
3
+ This feature not supported on ARM devices only x86_64. I was unable to build [piper-phonemize](https://github.com/rhasspy/piper-phonemize)(my [fork](https://github.com/fedirz/piper-phonemize))
4
+
5
+ https://platform.openai.com/docs/api-reference/audio/createSpeech
6
+ https://platform.openai.com/docs/guides/text-to-speech
7
+ http://localhost:8001/faster-whisper-server/api/
8
+ TODO: add a note about automatic downloads
9
+ TODO: add a note about api-key
10
+ TODO: add a demo
11
+
12
+ ## Prerequisite
13
+
14
+ Download the piper voices from [HuggingFace model repository](https://huggingface.co/rhasspy/piper-voices)
15
+
16
+ ```bash
17
+ # Download all voices (~15 minutes / 7.7 Gbs)
18
+ docker exec -it faster-whisper-server huggingface-cli download rhasspy/piper-voices
19
+ # Download all English voices (~4.5 minutes)
20
+ docker exec -it faster-whisper-server huggingface-cli download rhasspy/piper-voices --include 'en/**/*' 'voices.json'
21
+ # Download all qualities of a specific voice (~4 seconds)
22
+ docker exec -it faster-whisper-server huggingface-cli download rhasspy/piper-voices --include 'en/en_US/amy/**/*' 'voices.json'
23
+ # Download specific quality of a specific voice (~2 seconds)
24
+ docker exec -it faster-whisper-server huggingface-cli download rhasspy/piper-voices --include 'en/en_US/amy/medium/*' 'voices.json'
25
+ ```
26
+
27
+ !!! note
28
+
29
+ You can find audio samples of all the available voices [here](https://rhasspy.github.io/piper-samples/)
30
+
31
+ ## Curl
32
+
33
+ ```bash
34
+ # Generate speech from text using the default values (response_format="mp3", speed=1.0, voice="en_US-amy-medium", etc.)
35
+ curl http://localhost:8000/v1/audio/speech --header "Content-Type: application/json" --data '{"input": "Hello World!"}' --output audio.mp3
36
+ # Specifying the output format
37
+ curl http://localhost:8000/v1/audio/speech --header "Content-Type: application/json" --data '{"input": "Hello World!", "response_format": "wav"}' --output audio.wav
38
+ # Specifying the audio speed
39
+ curl http://localhost:8000/v1/audio/speech --header "Content-Type: application/json" --data '{"input": "Hello World!", "speed": 2.0}' --output audio.mp3
40
+
41
+ # List available (downloaded) voices
42
+ curl http://localhost:8000/v1/audio/speech/voices
43
+ # List just the voice names
44
+ curl http://localhost:8000/v1/audio/speech/voices | jq --raw-output '.[] | .voice'
45
+ # List just the voices in your language
46
+ curl --silent http://localhost:8000/v1/audio/speech/voices | jq --raw-output '.[] | select(.voice | startswith("en")) | .voice'
47
+
48
+ curl http://localhost:8000/v1/audio/speech --header "Content-Type: application/json" --data '{"input": "Hello World!", "voice": "en_US-ryan-high"}' --output audio.mp3
49
+ ```
50
+
51
+ ## Python
52
+
53
+ === "httpx"
54
+
55
+ ```python
56
+ from pathlib import Path
57
+
58
+ import httpx
59
+
60
+ client = httpx.Client(base_url="http://localhost:8000/")
61
+ res = client.post(
62
+ "v1/audio/speech",
63
+ json={
64
+ "model": "piper",
65
+ "voice": "en_US-amy-medium",
66
+ "input": "Hello, world!",
67
+ "response_format": "mp3",
68
+ "speed": 1,
69
+ },
70
+ ).raise_for_status()
71
+ with Path("output.mp3").open("wb") as f:
72
+ f.write(res.read())
73
+ ```
74
+
75
+ ## OpenAI SDKs
76
+
77
+ === "Python"
78
+
79
+ ```python
80
+ from pathlib import Path
81
+
82
+ from openai import OpenAI
83
+
84
+ openai = OpenAI(base_url="http://localhost:8000/v1", api_key="cant-be-empty")
85
+ res = openai.audio.speech.create(
86
+ model="piper",
87
+ voice="en_US-amy-medium", # pyright: ignore[reportArgumentType]
88
+ input="Hello, world!",
89
+ response_format="mp3",
90
+ speed=1,
91
+ )
92
+ with Path("output.mp3").open("wb") as f:
93
+ f.write(res.response.read())
94
+ ```
95
+
96
+ === "Other"
97
+
98
+ See [OpenAI libraries](https://platform.openai.com/docs/libraries)
mkdocs.yml CHANGED
@@ -1,6 +1,10 @@
1
  # yaml-language-server: $schema=https://squidfunk.github.io/mkdocs-material/schema.json
 
2
  site_name: Faster Whisper Server Documentation
3
- repo_url: https://github.com/fedirz/faster-whisper-server
 
 
 
4
  theme:
5
  language: en
6
  name: material
@@ -9,13 +13,15 @@ theme:
9
  primary: deep orange
10
  accent: indigo
11
  features:
12
- - content.tabs.link
13
- - content.code.copy
14
  - navigation.instant
15
  - navigation.instant.progress
16
  - navigation.instant.prefetch
 
17
  - search.highlight
18
  - search.share
 
 
19
  plugins:
20
  # https://github.com/bharel/mkdocs-render-swagger-plugin
21
  - render_swagger
@@ -23,9 +29,13 @@ plugins:
23
  default_handler: python
24
  nav:
25
  - Introduction: introduction.md
 
 
 
 
 
26
  - Installation: installation.md
27
  - Configuration: configuration.md
28
- - Usage: usage.md
29
  - API: api.md
30
  markdown_extensions:
31
  - admonition
@@ -34,3 +44,4 @@ markdown_extensions:
34
  alternate_style: true
35
  # https://github.com/mkdocs/mkdocs/issues/545
36
  - mdx_truly_sane_lists
 
 
1
  # yaml-language-server: $schema=https://squidfunk.github.io/mkdocs-material/schema.json
2
+ # https://www.mkdocs.org/user-guide/configuration/#configuration
3
  site_name: Faster Whisper Server Documentation
4
+ site_url: https://fedirz.github.io/faster-whisper-server/
5
+ repo_url: https://github.com/fedirz/faster-whisper-server/
6
+ edit_uri: edit/master/docs/
7
+ docs_dir: docs
8
  theme:
9
  language: en
10
  name: material
 
13
  primary: deep orange
14
  accent: indigo
15
  features:
16
+ # https://squidfunk.github.io/mkdocs-material/setup/setting-up-navigation/
 
17
  - navigation.instant
18
  - navigation.instant.progress
19
  - navigation.instant.prefetch
20
+ # https://squidfunk.github.io/mkdocs-material/setup/setting-up-site-search/
21
  - search.highlight
22
  - search.share
23
+ - content.tabs.link
24
+ - content.code.copy
25
  plugins:
26
  # https://github.com/bharel/mkdocs-render-swagger-plugin
27
  - render_swagger
 
29
  default_handler: python
30
  nav:
31
  - Introduction: introduction.md
32
+ - Capabilities / Usage:
33
+ - Speech-to-Text: usage/speech-to-text.md
34
+ - Text-to-Speech: usage/text-to-speech.md
35
+ - Live Transcription (using WebSockets): usage/live-transcription.md
36
+ - Open WebUI Intergration: usage/open-webui-integration.md
37
  - Installation: installation.md
38
  - Configuration: configuration.md
 
39
  - API: api.md
40
  markdown_extensions:
41
  - admonition
 
44
  alternate_style: true
45
  # https://github.com/mkdocs/mkdocs/issues/545
46
  - mdx_truly_sane_lists
47
+ # TODO: https://github.com/oprypin/markdown-callouts