Spaces:
Configuration error
title: Speaches
colorFrom: yellow
colorTo: pink
sdk: docker
app_port: 8000
suggested_hardware: t4-small
preload_from_hub:
- Systran/faster-distil-whisper-large-v3
- Systran/faster-distil-whisper-medium.en
- Systran/faster-distil-whisper-small.en
- Systran/faster-whisper-large-v3
- Systran/faster-whisper-medium
- Systran/faster-whisper-medium.en
- Systran/faster-whisper-small
- Systran/faster-whisper-small.en
- Systran/faster-whisper-tiny
- Systran/faster-whisper-tiny.en
- rhasspy/piper-voices
git remote add huggingface-space https://huggingface.co/spaces/speaches-ai/speaches
git push --force huggingface-space huggingface-space:main
TODO: Configure environment variables. See this.
This project was previously named
faster-whisper-server
. I've decided to change the name fromfaster-whisper-server
, as the project has evolved to support more than just transcription.
Speaches
speaches
is an OpenAI API-compatible server supporting transcription, translation, and speech generation. For transcription/translation it uses faster-whisper and for text-to-speech piper is used.
Features:
- GPU and CPU support.
- Easily deployable using Docker.
- Configurable through environment variables (see config.py).
- OpenAI API compatible.
- Streaming support (transcription is sent via SSE as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
- Live transcription support (audio is sent via websocket as it's generated).
- Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity.
Please create an issue if you find a bug, have a question, or a feature suggestion.
OpenAI API Compatibility ++
See OpenAI API reference for more information.
- Audio file transcription via
POST /v1/audio/transcriptions
endpoint.- Unlike OpenAI's API,
speaches
also supports streaming transcriptions (and translations). This is useful for when you want to process large audio files and would rather receive the transcription in chunks as they are processed, rather than waiting for the whole file to be transcribed. It works similarly to chat messages when chatting with LLMs.
- Unlike OpenAI's API,
- Audio file translation via
POST /v1/audio/translations
endpoint. - Live audio transcription via
WS /v1/audio/transcriptions
endpoint.- LocalAgreement2 (paper | original implementation) algorithm is used for live transcription.
- Only transcription of a single channel, 16000 sample rate, raw, 16-bit little-endian audio is supported.
Quick Start
Using Docker Compose (Recommended)
NOTE: I'm using newer Docker Compsose features. If you are using an older version of Docker Compose, you may need need to update.
curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.yaml
# for GPU support
curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.cuda.yaml
docker compose --file compose.cuda.yaml up --detach
# for CPU only (use this if you don't have a GPU, as the image is much smaller)
curl --silent --remote-name https://raw.githubusercontent.com/speaches-ai/speaches/master/compose.cpu.yaml
docker compose --file compose.cpu.yaml up --detach
Using Docker
# for GPU support
docker run --gpus=all --publish 8000:8000 --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub --detach ghcr.io/speaches-ai/speaches:latest-cuda
# for CPU only (use this if you don't have a GPU, as the image is much smaller)
docker run --publish 8000:8000 --volume hf-hub-cache:/home/ubuntu/.cache/huggingface/hub --env WHISPER__MODEL=Systran/faster-whisper-small --detach ghcr.io/speaches-ai/speaches:latest-cpu
Using Kubernetes
Follow this tutorial
Usage
If you are looking for a step-by-step walkthrough, check out this YouTube video.
OpenAI API CLI
export OPENAI_API_KEY="cant-be-empty"
export OPENAI_BASE_URL=http://localhost:8000/v1/
openai api audio.transcriptions.create -m Systran/faster-distil-whisper-large-v3 -f audio.wav --response-format text
openai api audio.translations.create -m Systran/faster-distil-whisper-large-v3 -f audio.wav --response-format verbose_json
OpenAI API Python SDK
from pathlib import Path
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="cant-be-empty")
with Path("audio.wav").open("rb") as f:
transcript = client.audio.transcriptions.create(model="Systran/faster-distil-whisper-large-v3", file=f)
print(transcript.text)
cURL
# If `model` isn't specified, the default model is used
curl http://localhost:8000/v1/audio/transcriptions -F "[email protected]"
curl http://localhost:8000/v1/audio/transcriptions -F "[email protected]"
curl http://localhost:8000/v1/audio/transcriptions -F "[email protected]" -F "stream=true"
curl http://localhost:8000/v1/audio/transcriptions -F "[email protected]" -F "model=Systran/faster-distil-whisper-large-v3"
# It's recommended that you always specify the language as that will reduce the transcription time
curl http://localhost:8000/v1/audio/transcriptions -F "[email protected]" -F "language=en"
curl http://localhost:8000/v1/audio/translations -F "[email protected]"
Live Transcription (using WebSocket)
From live-audio example
https://github.com/fedirz/faster-whisper-server/assets/76551385/e334c124-af61-41d4-839c-874be150598f
websocat installation is required. Live transcription of audio data from a microphone.
ffmpeg -loglevel quiet -f alsa -i default -ac 1 -ar 16000 -f s16le - | websocat --binary ws://localhost:8000/v1/audio/transcriptions