Spaces:

fedirz
/

faster-whisper-server

Configuration error

App Files Files Community

Fedir Zadniprovskyi commited on Jan 11

Commit

b56d19a

1 Parent(s): 73e7d80

docs: usage pages (and more)

Browse files

Files changed (8) hide show

.pre-commit-config.yaml +1 -1
README.md +6 -6
docs/introduction.md +13 -5
docs/usage/live-transcription.md +17 -0
docs/{usage.md → usage/open-webui-integration.md} +0 -50
docs/usage/speech-to-text.md +54 -0
docs/usage/text-to-speech.md +98 -0
mkdocs.yml +15 -4

.pre-commit-config.yaml CHANGED Viewed

@@ -44,4 +44,4 @@ repos:
     rev: v1.5.0
     hooks:
       - id: detect-secrets
-        exclude: 'README.md|tests/conftest.py|docs/usage.md'

     rev: v1.5.0
     hooks:
       - id: detect-secrets
+        exclude: 'README.md|tests/conftest.py|docs/usage/*'

README.md CHANGED Viewed

@@ -78,15 +78,15 @@ openai api audio.translations.create -m Systran/faster-distil-whisper-large-v3 -
 ### OpenAI API Python SDK
 ```python
 from openai import OpenAI
-client = OpenAI(api_key="cant-be-empty", base_url="http://localhost:8000/v1/")
-audio_file = open("audio.wav", "rb")
-transcript = client.audio.transcriptions.create(
-    model="Systran/faster-distil-whisper-large-v3", file=audio_file
-)
-print(transcript.text)
 ```
 ### cURL

 ### OpenAI API Python SDK
 ```python
+from pathlib import Path
 from openai import OpenAI
+client = OpenAI(base_url="http://localhost:8000/v1", api_key="cant-be-empty")
+with Path("audio.wav").open("rb") as f:
+    transcript = client.audio.transcriptions.create(model="Systran/faster-distil-whisper-large-v3", file=f)
+    print(transcript.text)
 ```
 ### cURL

docs/introduction.md CHANGED Viewed

@@ -2,18 +2,26 @@
     Under development. I don't yet recommend using these docs as reference for now.
 # Faster Whisper Server
-`faster-whisper-server` is an OpenAI API-compatible transcription server which uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) as its backend.
-Features:
 - GPU and CPU support.
-- Easily deployable using Docker.
-- **Configurable through environment variables (see [config.py](./src/faster_whisper_server/config.py))**.
-- OpenAI API compatible.
 - Streaming support (transcription is sent via [SSE](https://en.wikipedia.org/wiki/Server-sent_events) as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
 - Live transcription support (audio is sent via websocket as it's generated).
 - Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity.
 Please create an issue if you find a bug, have a question, or a feature suggestion.

     Under development. I don't yet recommend using these docs as reference for now.
+TODO: add HuggingFace Space URL
 # Faster Whisper Server
+`faster-whisper-server` is an OpenAI API-compatible server supporting transcription, translation, and speech generation. For transcription/translation it uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) and for text-to-speech [piper](https://github.com/rhasspy/piper) is used.
+## Features:
 - GPU and CPU support.
+- [Deployable via Docker Compose / Docker](./installation.md)
+- [Highly configurable](./configuration.md)
+- OpenAI API compatible. All tools and SDKs that work with OpenAI's API should work with `faster-whisper-server`.
 - Streaming support (transcription is sent via [SSE](https://en.wikipedia.org/wiki/Server-sent_events) as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
 - Live transcription support (audio is sent via websocket as it's generated).
 - Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity.
+- (Coming soon) Audio generation (chat completions endpoint) | [OpenAI Documentation](https://platform.openai.com/docs/guides/realtime)
+  - Generate a spoken audio summary of a body of text (text in, audio out)
+  - Perform sentiment analysis on a recording (audio in, text out)
+  - Async speech to speech interactions with a model (audio in, audio out)
+- (Coming soon) Realtime API | [OpenAI Documentation](https://platform.openai.com/docs/guides/realtime)
 Please create an issue if you find a bug, have a question, or a feature suggestion.

docs/usage/live-transcription.md ADDED Viewed

	@@ -0,0 +1,17 @@

+## Live Transcription (using WebSocket)
+!!! note
+    More content will be added here soon.
+TODO: fix link
+From [live-audio](./examples/live-audio) example
+https://github.com/fedirz/faster-whisper-server/assets/76551385/e334c124-af61-41d4-839c-874be150598f
+[websocat](https://github.com/vi/websocat?tab=readme-ov-file#installation) installation is required.
+Live transcription of audio data from a microphone.
+```bash
+ffmpeg -loglevel quiet -f alsa -i default -ac 1 -ar 16000 -f s16le - | websocat --binary ws://localhost:8000/v1/audio/transcriptions
+```

docs/{usage.md → usage/open-webui-integration.md} RENAMED Viewed

@@ -1,53 +1,3 @@
-TODO: break this down into: transcription/translation, streaming transcription/translation, live transcription, audio generation, model listing
-TODO: add video demos for all
-TODO: add a note about OPENAI_API_KEY
-## Curl
-```bash
-curl http://localhost:8000/v1/audio/transcriptions -F "[email protected]"
-```
-## Python
-=== "httpx"
-    ```python
-    import httpx
-    with open('audio.wav', 'rb') as f:
-        files = {'file': ('audio.wav', f)}
-        response = httpx.post('http://localhost:8000/v1/audio/transcriptions', files=files)
-    print(response.text)
-    ```
-## OpenAI SDKs
-=== "Python"
-    ```python
-    import httpx
-    with open('audio.wav', 'rb') as f:
-        files = {'file': ('audio.wav', f)}
-        response = httpx.post('http://localhost:8000/v1/audio/transcriptions', files=files)
-    print(response.text)
-    ```
-=== "CLI"
-    ```bash
-    export OPENAI_BASE_URL=http://localhost:8000/v1/
-    export OPENAI_API_KEY="cant-be-empty"
-    openai api audio.transcriptions.create -m Systran/faster-whisper-small -f audio.wav --response-format text
-    ```
-=== "Other"
-    See [OpenAI libraries](https://platform.openai.com/docs/libraries) and [OpenAI speech-to-text usage](https://platform.openai.com/docs/guides/speech-to-text).
 ## Open WebUI
 ### Using the UI




















































1	## Open WebUI
2
3	### Using the UI

docs/usage/speech-to-text.md ADDED Viewed

	@@ -0,0 +1,54 @@

+https://platform.openai.com/docs/api-reference/audio/createTranscription
+https://platform.openai.com/docs/guides/speech-to-text
+TODO: add a note about automatic downloads
+TODO: add a note about api-key
+TODO: mention streaming
+TODO: add a demo
+TODO: talk about audio format
+## Curl
+```bash
+curl http://localhost:8000/v1/audio/transcriptions -F "[email protected]"
+```
+## Python
+=== "httpx"
+    ```python
+    import httpx
+    with open('audio.wav', 'rb') as f:
+        files = {'file': ('audio.wav', f)}
+        response = httpx.post('http://localhost:8000/v1/audio/transcriptions', files=files)
+    print(response.text)
+    ```
+## OpenAI SDKs
+=== "Python"
+    ```python
+    import httpx
+    with open('audio.wav', 'rb') as f:
+        files = {'file': ('audio.wav', f)}
+        response = httpx.post('http://localhost:8000/v1/audio/transcriptions', files=files)
+    print(response.text)
+    ```
+=== "CLI"
+    ```bash
+    export OPENAI_BASE_URL=http://localhost:8000/v1/
+    export OPENAI_API_KEY="cant-be-empty"
+    openai api audio.transcriptions.create -m Systran/faster-whisper-small -f audio.wav --response-format text
+    ```
+=== "Other"
+    See [OpenAI libraries](https://platform.openai.com/docs/libraries).

docs/usage/text-to-speech.md ADDED Viewed

	@@ -0,0 +1,98 @@

+!!! warning
+    This feature not supported on ARM devices only x86_64. I was unable to build [piper-phonemize](https://github.com/rhasspy/piper-phonemize)(my [fork](https://github.com/fedirz/piper-phonemize))
+https://platform.openai.com/docs/api-reference/audio/createSpeech
+https://platform.openai.com/docs/guides/text-to-speech
+http://localhost:8001/faster-whisper-server/api/
+TODO: add a note about automatic downloads
+TODO: add a note about api-key
+TODO: add a demo
+## Prerequisite
+Download the piper voices from [HuggingFace model repository](https://huggingface.co/rhasspy/piper-voices)
+```bash
+# Download all voices (~15 minutes / 7.7 Gbs)
+docker exec -it faster-whisper-server huggingface-cli download rhasspy/piper-voices
+# Download all English voices (~4.5 minutes)
+docker exec -it faster-whisper-server huggingface-cli download rhasspy/piper-voices --include 'en/**/*' 'voices.json'
+# Download all qualities of a specific voice (~4 seconds)
+docker exec -it faster-whisper-server huggingface-cli download rhasspy/piper-voices --include 'en/en_US/amy/**/*' 'voices.json'
+# Download specific quality of a specific voice (~2 seconds)
+docker exec -it faster-whisper-server huggingface-cli download rhasspy/piper-voices --include 'en/en_US/amy/medium/*' 'voices.json'
+```
+!!! note
+    You can find audio samples of all the available voices [here](https://rhasspy.github.io/piper-samples/)
+## Curl
+```bash
+# Generate speech from text using the default values (response_format="mp3", speed=1.0, voice="en_US-amy-medium", etc.)
+curl http://localhost:8000/v1/audio/speech --header "Content-Type: application/json" --data '{"input": "Hello World!"}' --output audio.mp3
+# Specifying the output format
+curl http://localhost:8000/v1/audio/speech --header "Content-Type: application/json" --data '{"input": "Hello World!", "response_format": "wav"}' --output audio.wav
+# Specifying the audio speed
+curl http://localhost:8000/v1/audio/speech --header "Content-Type: application/json" --data '{"input": "Hello World!", "speed": 2.0}' --output audio.mp3
+# List available (downloaded) voices
+curl http://localhost:8000/v1/audio/speech/voices
+# List just the voice names
+curl http://localhost:8000/v1/audio/speech/voices | jq --raw-output '.[] | .voice'
+# List just the voices in your language
+curl --silent http://localhost:8000/v1/audio/speech/voices | jq --raw-output '.[] | select(.voice | startswith("en")) | .voice'
+curl http://localhost:8000/v1/audio/speech --header "Content-Type: application/json" --data '{"input": "Hello World!", "voice": "en_US-ryan-high"}' --output audio.mp3
+```
+## Python
+=== "httpx"
+    ```python
+    from pathlib import Path
+    import httpx
+    client = httpx.Client(base_url="http://localhost:8000/")
+    res = client.post(
+        "v1/audio/speech",
+        json={
+            "model": "piper",
+            "voice": "en_US-amy-medium",
+            "input": "Hello, world!",
+            "response_format": "mp3",
+            "speed": 1,
+        },
+    ).raise_for_status()
+    with Path("output.mp3").open("wb") as f:
+        f.write(res.read())
+    ```
+## OpenAI SDKs
+=== "Python"
+    ```python
+    from pathlib import Path
+    from openai import OpenAI
+    openai = OpenAI(base_url="http://localhost:8000/v1", api_key="cant-be-empty")
+    res = openai.audio.speech.create(
+        model="piper",
+        voice="en_US-amy-medium",  # pyright: ignore[reportArgumentType]
+        input="Hello, world!",
+        response_format="mp3",
+        speed=1,
+    )
+    with Path("output.mp3").open("wb") as f:
+        f.write(res.response.read())
+    ```
+=== "Other"
+    See [OpenAI libraries](https://platform.openai.com/docs/libraries)

mkdocs.yml CHANGED Viewed

@@ -1,6 +1,10 @@
 # yaml-language-server: $schema=https://squidfunk.github.io/mkdocs-material/schema.json
 site_name: Faster Whisper Server Documentation
-repo_url: https://github.com/fedirz/faster-whisper-server
 theme:
   language: en
   name: material
@@ -9,13 +13,15 @@ theme:
     primary: deep orange
     accent: indigo
   features:
-    - content.tabs.link
-    - content.code.copy
     - navigation.instant
     - navigation.instant.progress
     - navigation.instant.prefetch
     - search.highlight
     - search.share
 plugins:
   # https://github.com/bharel/mkdocs-render-swagger-plugin
   - render_swagger
@@ -23,9 +29,13 @@ plugins:
       default_handler: python
 nav:
   - Introduction: introduction.md
   - Installation: installation.md
   - Configuration: configuration.md
-  - Usage: usage.md
   - API: api.md
 markdown_extensions:
   - admonition
@@ -34,3 +44,4 @@ markdown_extensions:
       alternate_style: true
   # https://github.com/mkdocs/mkdocs/issues/545
   - mdx_truly_sane_lists

 # yaml-language-server: $schema=https://squidfunk.github.io/mkdocs-material/schema.json
+# https://www.mkdocs.org/user-guide/configuration/#configuration
 site_name: Faster Whisper Server Documentation
+site_url: https://fedirz.github.io/faster-whisper-server/
+repo_url: https://github.com/fedirz/faster-whisper-server/
+edit_uri: edit/master/docs/
+docs_dir: docs
 theme:
   language: en
   name: material
     primary: deep orange
     accent: indigo
   features:
+    # https://squidfunk.github.io/mkdocs-material/setup/setting-up-navigation/
     - navigation.instant
     - navigation.instant.progress
     - navigation.instant.prefetch
+    # https://squidfunk.github.io/mkdocs-material/setup/setting-up-site-search/
     - search.highlight
     - search.share
+    - content.tabs.link
+    - content.code.copy
 plugins:
   # https://github.com/bharel/mkdocs-render-swagger-plugin
   - render_swagger
       default_handler: python
 nav:
   - Introduction: introduction.md
+  - Capabilities / Usage:
+      - Speech-to-Text: usage/speech-to-text.md
+      - Text-to-Speech: usage/text-to-speech.md
+      - Live Transcription (using WebSockets): usage/live-transcription.md
+      - Open WebUI Intergration: usage/open-webui-integration.md
   - Installation: installation.md
   - Configuration: configuration.md
   - API: api.md
 markdown_extensions:
   - admonition
       alternate_style: true
   # https://github.com/mkdocs/mkdocs/issues/545
   - mdx_truly_sane_lists
+# TODO: https://github.com/oprypin/markdown-callouts