metadata
sdk: gradio
sdk_version: 5.16.0
Whisper-WebUI
A Gradio-based browser interface for Whisper
Features
- Select the Whisper implementation you want to use between:
- openai/whisper
- SYSTRAN/faster-whisper (used by default)
- Vaibhavs10/insanely-fast-whisper
- Generate transcriptions from various sources, including files & microphone
- Currently supported output formats: csv, srt & txt
- Speech to Text Translation:
- From other languages to English (This is Whisper's end-to-end speech-to-text translation feature)
- Translate transcription files using Facebook NLLB models
- Pre-processing audio input with Silero VAD
- Post-processing with speaker diarization using the pyannote model:
- To download the pyannote model, you need to have a Huggingface token and manually accept their terms in the pages below:
Installation and Running
Run Locally
Prerequisite
To run this WebUI, you need to have `git`, `python` version 3.8 ~ 3.10, `FFmpeg`<br> And if you're not using an Nvida GPU, or using a different `CUDA` version than 12.4, edit the **file requirements.txt** to match your environment Please follow the links below to install the necessary software: - git : [https://git-scm.com/downloads](https://git-scm.com/downloads) - python : [https://www.python.org/downloads/](https://www.python.org/downloads/) - FFmpeg : [https://ffmpeg.org/download.html](https://ffmpeg.org/download.html) - CUDA : [https://developer.nvidia.com/cuda-downloads](https://developer.nvidia.com/cuda-downloads) After installing FFmpeg, **make sure to add the `FFmpeg/bin` folder to your system PATH!**
Installation Using the Script Files
1. Download the the repository and extract its contents 2. Run `install.bat` or `install.sh` to install dependencies (It will create a `venv` directory and install dependencies there) 3. Start WebUI with `start-webui.bat` or `start-webui.sh` (It will run `python app.py` after activating the venv)
Running with Docker
Install and launch Docker-Desktop
Get the repository
Build the image ( Image is about ~7GB)
docker compose build
- Run the container
docker compose up
- Connect to the WebUI with your browser at
http://localhost:7860
If needed, update the docker-compose.yaml to match your environment
VRAM Usages
This project is integrated with faster-whisper by default for better VRAM usage and transcription speed
According to faster-whisper, the efficiency of the optimized whisper model is as follows:
Implementation | Precision | Beam size | Time | Max. GPU memory | Max. CPU memory |
---|---|---|---|---|---|
openai/whisper | fp16 | 5 | 4m30s | 11325MB | 9439MB |
faster-whisper | fp16 | 5 | 54s | 4755MB | 3244MB |
Available models
This is Whisper's original VRAM usage table for models:
Size | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
---|---|---|---|---|---|
tiny | 39 M | tiny.en |
tiny |
~1 GB | ~32x |
base | 74 M | base.en |
base |
~1 GB | ~16x |
small | 244 M | small.en |
small |
~2 GB | ~6x |
medium | 769 M | medium.en |
medium |
~5 GB | ~2x |
large | 1550 M | N/A | large |
~10 GB | 1x |
Note: .en
models are for English only, and you can use the Translate to English
option from the other models