Spaces:
Configuration error
Configuration error
Fedir Zadniprovskyi
commited on
Commit
·
b56d19a
1
Parent(s):
73e7d80
docs: usage pages (and more)
Browse files- .pre-commit-config.yaml +1 -1
- README.md +6 -6
- docs/introduction.md +13 -5
- docs/usage/live-transcription.md +17 -0
- docs/{usage.md → usage/open-webui-integration.md} +0 -50
- docs/usage/speech-to-text.md +54 -0
- docs/usage/text-to-speech.md +98 -0
- mkdocs.yml +15 -4
.pre-commit-config.yaml
CHANGED
@@ -44,4 +44,4 @@ repos:
|
|
44 |
rev: v1.5.0
|
45 |
hooks:
|
46 |
- id: detect-secrets
|
47 |
-
exclude: 'README.md|tests/conftest.py|docs/usage
|
|
|
44 |
rev: v1.5.0
|
45 |
hooks:
|
46 |
- id: detect-secrets
|
47 |
+
exclude: 'README.md|tests/conftest.py|docs/usage/*'
|
README.md
CHANGED
@@ -78,15 +78,15 @@ openai api audio.translations.create -m Systran/faster-distil-whisper-large-v3 -
|
|
78 |
### OpenAI API Python SDK
|
79 |
|
80 |
```python
|
|
|
|
|
81 |
from openai import OpenAI
|
82 |
|
83 |
-
client = OpenAI(
|
84 |
|
85 |
-
|
86 |
-
transcript = client.audio.transcriptions.create(
|
87 |
-
|
88 |
-
)
|
89 |
-
print(transcript.text)
|
90 |
```
|
91 |
|
92 |
### cURL
|
|
|
78 |
### OpenAI API Python SDK
|
79 |
|
80 |
```python
|
81 |
+
from pathlib import Path
|
82 |
+
|
83 |
from openai import OpenAI
|
84 |
|
85 |
+
client = OpenAI(base_url="http://localhost:8000/v1", api_key="cant-be-empty")
|
86 |
|
87 |
+
with Path("audio.wav").open("rb") as f:
|
88 |
+
transcript = client.audio.transcriptions.create(model="Systran/faster-distil-whisper-large-v3", file=f)
|
89 |
+
print(transcript.text)
|
|
|
|
|
90 |
```
|
91 |
|
92 |
### cURL
|
docs/introduction.md
CHANGED
@@ -2,18 +2,26 @@
|
|
2 |
|
3 |
Under development. I don't yet recommend using these docs as reference for now.
|
4 |
|
|
|
|
|
5 |
# Faster Whisper Server
|
6 |
|
7 |
-
`faster-whisper-server` is an OpenAI API-compatible transcription
|
8 |
-
|
|
|
9 |
|
10 |
- GPU and CPU support.
|
11 |
-
-
|
12 |
-
-
|
13 |
-
- OpenAI API compatible.
|
14 |
- Streaming support (transcription is sent via [SSE](https://en.wikipedia.org/wiki/Server-sent_events) as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
|
15 |
- Live transcription support (audio is sent via websocket as it's generated).
|
16 |
- Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity.
|
|
|
|
|
|
|
|
|
|
|
17 |
|
18 |
Please create an issue if you find a bug, have a question, or a feature suggestion.
|
19 |
|
|
|
2 |
|
3 |
Under development. I don't yet recommend using these docs as reference for now.
|
4 |
|
5 |
+
TODO: add HuggingFace Space URL
|
6 |
+
|
7 |
# Faster Whisper Server
|
8 |
|
9 |
+
`faster-whisper-server` is an OpenAI API-compatible server supporting transcription, translation, and speech generation. For transcription/translation it uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) and for text-to-speech [piper](https://github.com/rhasspy/piper) is used.
|
10 |
+
|
11 |
+
## Features:
|
12 |
|
13 |
- GPU and CPU support.
|
14 |
+
- [Deployable via Docker Compose / Docker](./installation.md)
|
15 |
+
- [Highly configurable](./configuration.md)
|
16 |
+
- OpenAI API compatible. All tools and SDKs that work with OpenAI's API should work with `faster-whisper-server`.
|
17 |
- Streaming support (transcription is sent via [SSE](https://en.wikipedia.org/wiki/Server-sent_events) as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
|
18 |
- Live transcription support (audio is sent via websocket as it's generated).
|
19 |
- Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity.
|
20 |
+
- (Coming soon) Audio generation (chat completions endpoint) | [OpenAI Documentation](https://platform.openai.com/docs/guides/realtime)
|
21 |
+
- Generate a spoken audio summary of a body of text (text in, audio out)
|
22 |
+
- Perform sentiment analysis on a recording (audio in, text out)
|
23 |
+
- Async speech to speech interactions with a model (audio in, audio out)
|
24 |
+
- (Coming soon) Realtime API | [OpenAI Documentation](https://platform.openai.com/docs/guides/realtime)
|
25 |
|
26 |
Please create an issue if you find a bug, have a question, or a feature suggestion.
|
27 |
|
docs/usage/live-transcription.md
ADDED
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
## Live Transcription (using WebSocket)
|
2 |
+
|
3 |
+
!!! note
|
4 |
+
|
5 |
+
More content will be added here soon.
|
6 |
+
|
7 |
+
TODO: fix link
|
8 |
+
From [live-audio](./examples/live-audio) example
|
9 |
+
|
10 |
+
https://github.com/fedirz/faster-whisper-server/assets/76551385/e334c124-af61-41d4-839c-874be150598f
|
11 |
+
|
12 |
+
[websocat](https://github.com/vi/websocat?tab=readme-ov-file#installation) installation is required.
|
13 |
+
Live transcription of audio data from a microphone.
|
14 |
+
|
15 |
+
```bash
|
16 |
+
ffmpeg -loglevel quiet -f alsa -i default -ac 1 -ar 16000 -f s16le - | websocat --binary ws://localhost:8000/v1/audio/transcriptions
|
17 |
+
```
|
docs/{usage.md → usage/open-webui-integration.md}
RENAMED
@@ -1,53 +1,3 @@
|
|
1 |
-
TODO: break this down into: transcription/translation, streaming transcription/translation, live transcription, audio generation, model listing
|
2 |
-
TODO: add video demos for all
|
3 |
-
TODO: add a note about OPENAI_API_KEY
|
4 |
-
|
5 |
-
## Curl
|
6 |
-
|
7 |
-
```bash
|
8 |
-
curl http://localhost:8000/v1/audio/transcriptions -F "[email protected]"
|
9 |
-
```
|
10 |
-
|
11 |
-
## Python
|
12 |
-
|
13 |
-
=== "httpx"
|
14 |
-
|
15 |
-
```python
|
16 |
-
import httpx
|
17 |
-
|
18 |
-
with open('audio.wav', 'rb') as f:
|
19 |
-
files = {'file': ('audio.wav', f)}
|
20 |
-
response = httpx.post('http://localhost:8000/v1/audio/transcriptions', files=files)
|
21 |
-
|
22 |
-
print(response.text)
|
23 |
-
```
|
24 |
-
|
25 |
-
## OpenAI SDKs
|
26 |
-
|
27 |
-
=== "Python"
|
28 |
-
|
29 |
-
```python
|
30 |
-
import httpx
|
31 |
-
|
32 |
-
with open('audio.wav', 'rb') as f:
|
33 |
-
files = {'file': ('audio.wav', f)}
|
34 |
-
response = httpx.post('http://localhost:8000/v1/audio/transcriptions', files=files)
|
35 |
-
|
36 |
-
print(response.text)
|
37 |
-
```
|
38 |
-
|
39 |
-
=== "CLI"
|
40 |
-
|
41 |
-
```bash
|
42 |
-
export OPENAI_BASE_URL=http://localhost:8000/v1/
|
43 |
-
export OPENAI_API_KEY="cant-be-empty"
|
44 |
-
openai api audio.transcriptions.create -m Systran/faster-whisper-small -f audio.wav --response-format text
|
45 |
-
```
|
46 |
-
|
47 |
-
=== "Other"
|
48 |
-
|
49 |
-
See [OpenAI libraries](https://platform.openai.com/docs/libraries) and [OpenAI speech-to-text usage](https://platform.openai.com/docs/guides/speech-to-text).
|
50 |
-
|
51 |
## Open WebUI
|
52 |
|
53 |
### Using the UI
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
## Open WebUI
|
2 |
|
3 |
### Using the UI
|
docs/usage/speech-to-text.md
ADDED
@@ -0,0 +1,54 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
https://platform.openai.com/docs/api-reference/audio/createTranscription
|
2 |
+
https://platform.openai.com/docs/guides/speech-to-text
|
3 |
+
|
4 |
+
TODO: add a note about automatic downloads
|
5 |
+
TODO: add a note about api-key
|
6 |
+
TODO: mention streaming
|
7 |
+
TODO: add a demo
|
8 |
+
TODO: talk about audio format
|
9 |
+
|
10 |
+
## Curl
|
11 |
+
|
12 |
+
```bash
|
13 |
+
curl http://localhost:8000/v1/audio/transcriptions -F "[email protected]"
|
14 |
+
```
|
15 |
+
|
16 |
+
## Python
|
17 |
+
|
18 |
+
=== "httpx"
|
19 |
+
|
20 |
+
```python
|
21 |
+
import httpx
|
22 |
+
|
23 |
+
with open('audio.wav', 'rb') as f:
|
24 |
+
files = {'file': ('audio.wav', f)}
|
25 |
+
response = httpx.post('http://localhost:8000/v1/audio/transcriptions', files=files)
|
26 |
+
|
27 |
+
print(response.text)
|
28 |
+
```
|
29 |
+
|
30 |
+
## OpenAI SDKs
|
31 |
+
|
32 |
+
=== "Python"
|
33 |
+
|
34 |
+
```python
|
35 |
+
import httpx
|
36 |
+
|
37 |
+
with open('audio.wav', 'rb') as f:
|
38 |
+
files = {'file': ('audio.wav', f)}
|
39 |
+
response = httpx.post('http://localhost:8000/v1/audio/transcriptions', files=files)
|
40 |
+
|
41 |
+
print(response.text)
|
42 |
+
```
|
43 |
+
|
44 |
+
=== "CLI"
|
45 |
+
|
46 |
+
```bash
|
47 |
+
export OPENAI_BASE_URL=http://localhost:8000/v1/
|
48 |
+
export OPENAI_API_KEY="cant-be-empty"
|
49 |
+
openai api audio.transcriptions.create -m Systran/faster-whisper-small -f audio.wav --response-format text
|
50 |
+
```
|
51 |
+
|
52 |
+
=== "Other"
|
53 |
+
|
54 |
+
See [OpenAI libraries](https://platform.openai.com/docs/libraries).
|
docs/usage/text-to-speech.md
ADDED
@@ -0,0 +1,98 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
!!! warning
|
2 |
+
|
3 |
+
This feature not supported on ARM devices only x86_64. I was unable to build [piper-phonemize](https://github.com/rhasspy/piper-phonemize)(my [fork](https://github.com/fedirz/piper-phonemize))
|
4 |
+
|
5 |
+
https://platform.openai.com/docs/api-reference/audio/createSpeech
|
6 |
+
https://platform.openai.com/docs/guides/text-to-speech
|
7 |
+
http://localhost:8001/faster-whisper-server/api/
|
8 |
+
TODO: add a note about automatic downloads
|
9 |
+
TODO: add a note about api-key
|
10 |
+
TODO: add a demo
|
11 |
+
|
12 |
+
## Prerequisite
|
13 |
+
|
14 |
+
Download the piper voices from [HuggingFace model repository](https://huggingface.co/rhasspy/piper-voices)
|
15 |
+
|
16 |
+
```bash
|
17 |
+
# Download all voices (~15 minutes / 7.7 Gbs)
|
18 |
+
docker exec -it faster-whisper-server huggingface-cli download rhasspy/piper-voices
|
19 |
+
# Download all English voices (~4.5 minutes)
|
20 |
+
docker exec -it faster-whisper-server huggingface-cli download rhasspy/piper-voices --include 'en/**/*' 'voices.json'
|
21 |
+
# Download all qualities of a specific voice (~4 seconds)
|
22 |
+
docker exec -it faster-whisper-server huggingface-cli download rhasspy/piper-voices --include 'en/en_US/amy/**/*' 'voices.json'
|
23 |
+
# Download specific quality of a specific voice (~2 seconds)
|
24 |
+
docker exec -it faster-whisper-server huggingface-cli download rhasspy/piper-voices --include 'en/en_US/amy/medium/*' 'voices.json'
|
25 |
+
```
|
26 |
+
|
27 |
+
!!! note
|
28 |
+
|
29 |
+
You can find audio samples of all the available voices [here](https://rhasspy.github.io/piper-samples/)
|
30 |
+
|
31 |
+
## Curl
|
32 |
+
|
33 |
+
```bash
|
34 |
+
# Generate speech from text using the default values (response_format="mp3", speed=1.0, voice="en_US-amy-medium", etc.)
|
35 |
+
curl http://localhost:8000/v1/audio/speech --header "Content-Type: application/json" --data '{"input": "Hello World!"}' --output audio.mp3
|
36 |
+
# Specifying the output format
|
37 |
+
curl http://localhost:8000/v1/audio/speech --header "Content-Type: application/json" --data '{"input": "Hello World!", "response_format": "wav"}' --output audio.wav
|
38 |
+
# Specifying the audio speed
|
39 |
+
curl http://localhost:8000/v1/audio/speech --header "Content-Type: application/json" --data '{"input": "Hello World!", "speed": 2.0}' --output audio.mp3
|
40 |
+
|
41 |
+
# List available (downloaded) voices
|
42 |
+
curl http://localhost:8000/v1/audio/speech/voices
|
43 |
+
# List just the voice names
|
44 |
+
curl http://localhost:8000/v1/audio/speech/voices | jq --raw-output '.[] | .voice'
|
45 |
+
# List just the voices in your language
|
46 |
+
curl --silent http://localhost:8000/v1/audio/speech/voices | jq --raw-output '.[] | select(.voice | startswith("en")) | .voice'
|
47 |
+
|
48 |
+
curl http://localhost:8000/v1/audio/speech --header "Content-Type: application/json" --data '{"input": "Hello World!", "voice": "en_US-ryan-high"}' --output audio.mp3
|
49 |
+
```
|
50 |
+
|
51 |
+
## Python
|
52 |
+
|
53 |
+
=== "httpx"
|
54 |
+
|
55 |
+
```python
|
56 |
+
from pathlib import Path
|
57 |
+
|
58 |
+
import httpx
|
59 |
+
|
60 |
+
client = httpx.Client(base_url="http://localhost:8000/")
|
61 |
+
res = client.post(
|
62 |
+
"v1/audio/speech",
|
63 |
+
json={
|
64 |
+
"model": "piper",
|
65 |
+
"voice": "en_US-amy-medium",
|
66 |
+
"input": "Hello, world!",
|
67 |
+
"response_format": "mp3",
|
68 |
+
"speed": 1,
|
69 |
+
},
|
70 |
+
).raise_for_status()
|
71 |
+
with Path("output.mp3").open("wb") as f:
|
72 |
+
f.write(res.read())
|
73 |
+
```
|
74 |
+
|
75 |
+
## OpenAI SDKs
|
76 |
+
|
77 |
+
=== "Python"
|
78 |
+
|
79 |
+
```python
|
80 |
+
from pathlib import Path
|
81 |
+
|
82 |
+
from openai import OpenAI
|
83 |
+
|
84 |
+
openai = OpenAI(base_url="http://localhost:8000/v1", api_key="cant-be-empty")
|
85 |
+
res = openai.audio.speech.create(
|
86 |
+
model="piper",
|
87 |
+
voice="en_US-amy-medium", # pyright: ignore[reportArgumentType]
|
88 |
+
input="Hello, world!",
|
89 |
+
response_format="mp3",
|
90 |
+
speed=1,
|
91 |
+
)
|
92 |
+
with Path("output.mp3").open("wb") as f:
|
93 |
+
f.write(res.response.read())
|
94 |
+
```
|
95 |
+
|
96 |
+
=== "Other"
|
97 |
+
|
98 |
+
See [OpenAI libraries](https://platform.openai.com/docs/libraries)
|
mkdocs.yml
CHANGED
@@ -1,6 +1,10 @@
|
|
1 |
# yaml-language-server: $schema=https://squidfunk.github.io/mkdocs-material/schema.json
|
|
|
2 |
site_name: Faster Whisper Server Documentation
|
3 |
-
|
|
|
|
|
|
|
4 |
theme:
|
5 |
language: en
|
6 |
name: material
|
@@ -9,13 +13,15 @@ theme:
|
|
9 |
primary: deep orange
|
10 |
accent: indigo
|
11 |
features:
|
12 |
-
|
13 |
-
- content.code.copy
|
14 |
- navigation.instant
|
15 |
- navigation.instant.progress
|
16 |
- navigation.instant.prefetch
|
|
|
17 |
- search.highlight
|
18 |
- search.share
|
|
|
|
|
19 |
plugins:
|
20 |
# https://github.com/bharel/mkdocs-render-swagger-plugin
|
21 |
- render_swagger
|
@@ -23,9 +29,13 @@ plugins:
|
|
23 |
default_handler: python
|
24 |
nav:
|
25 |
- Introduction: introduction.md
|
|
|
|
|
|
|
|
|
|
|
26 |
- Installation: installation.md
|
27 |
- Configuration: configuration.md
|
28 |
-
- Usage: usage.md
|
29 |
- API: api.md
|
30 |
markdown_extensions:
|
31 |
- admonition
|
@@ -34,3 +44,4 @@ markdown_extensions:
|
|
34 |
alternate_style: true
|
35 |
# https://github.com/mkdocs/mkdocs/issues/545
|
36 |
- mdx_truly_sane_lists
|
|
|
|
1 |
# yaml-language-server: $schema=https://squidfunk.github.io/mkdocs-material/schema.json
|
2 |
+
# https://www.mkdocs.org/user-guide/configuration/#configuration
|
3 |
site_name: Faster Whisper Server Documentation
|
4 |
+
site_url: https://fedirz.github.io/faster-whisper-server/
|
5 |
+
repo_url: https://github.com/fedirz/faster-whisper-server/
|
6 |
+
edit_uri: edit/master/docs/
|
7 |
+
docs_dir: docs
|
8 |
theme:
|
9 |
language: en
|
10 |
name: material
|
|
|
13 |
primary: deep orange
|
14 |
accent: indigo
|
15 |
features:
|
16 |
+
# https://squidfunk.github.io/mkdocs-material/setup/setting-up-navigation/
|
|
|
17 |
- navigation.instant
|
18 |
- navigation.instant.progress
|
19 |
- navigation.instant.prefetch
|
20 |
+
# https://squidfunk.github.io/mkdocs-material/setup/setting-up-site-search/
|
21 |
- search.highlight
|
22 |
- search.share
|
23 |
+
- content.tabs.link
|
24 |
+
- content.code.copy
|
25 |
plugins:
|
26 |
# https://github.com/bharel/mkdocs-render-swagger-plugin
|
27 |
- render_swagger
|
|
|
29 |
default_handler: python
|
30 |
nav:
|
31 |
- Introduction: introduction.md
|
32 |
+
- Capabilities / Usage:
|
33 |
+
- Speech-to-Text: usage/speech-to-text.md
|
34 |
+
- Text-to-Speech: usage/text-to-speech.md
|
35 |
+
- Live Transcription (using WebSockets): usage/live-transcription.md
|
36 |
+
- Open WebUI Intergration: usage/open-webui-integration.md
|
37 |
- Installation: installation.md
|
38 |
- Configuration: configuration.md
|
|
|
39 |
- API: api.md
|
40 |
markdown_extensions:
|
41 |
- admonition
|
|
|
44 |
alternate_style: true
|
45 |
# https://github.com/mkdocs/mkdocs/issues/545
|
46 |
- mdx_truly_sane_lists
|
47 |
+
# TODO: https://github.com/oprypin/markdown-callouts
|