Demo / README.md
LAP-DEV's picture
Update README.md
6950439 verified
|
raw
history blame
4.47 kB
metadata
sdk: gradio
sdk_version: 5.16.0

Whisper-WebUI

A Gradio-based browser interface for Whisper

Features

Installation and Running

  • Run Locally

    Prerequisite

    To run this WebUI, you need to have `git`, `python` version 3.8 ~ 3.10, `FFmpeg`<br>
    And if you're not using an Nvida GPU, or using a different `CUDA` version than 12.4,  edit the **file requirements.txt** to match your environment
    
    Please follow the links below to install the necessary software:
    - git : [https://git-scm.com/downloads](https://git-scm.com/downloads)
    - python : [https://www.python.org/downloads/](https://www.python.org/downloads/)
    - FFmpeg :  [https://ffmpeg.org/download.html](https://ffmpeg.org/download.html)
    - CUDA : [https://developer.nvidia.com/cuda-downloads](https://developer.nvidia.com/cuda-downloads)
    
    After installing FFmpeg, **make sure to add the `FFmpeg/bin` folder to your system PATH!**
    

    Installation Using the Script Files

    1. Download the the repository and extract its contents 
    2. Run `install.bat` or `install.sh` to install dependencies (It will create a `venv` directory and install dependencies there)
    3. Start WebUI with `start-webui.bat` or `start-webui.sh` (It will run `python app.py` after activating the venv)
    
  • Running with Docker

    1. Install and launch Docker-Desktop

    2. Get the repository

    3. Build the image ( Image is about ~7GB)

    docker compose build 
    
    1. Run the container
    docker compose up
    
    1. Connect to the WebUI with your browser at http://localhost:7860

    If needed, update the docker-compose.yaml to match your environment

VRAM Usages

This project is integrated with faster-whisper by default for better VRAM usage and transcription speed

According to faster-whisper, the efficiency of the optimized whisper model is as follows:

Implementation Precision Beam size Time Max. GPU memory Max. CPU memory
openai/whisper fp16 5 4m30s 11325MB 9439MB
faster-whisper fp16 5 54s 4755MB 3244MB

Available models

This is Whisper's original VRAM usage table for models:

Size Parameters English-only model Multilingual model Required VRAM Relative speed
tiny 39 M tiny.en tiny ~1 GB ~32x
base 74 M base.en base ~1 GB ~16x
small 244 M small.en small ~2 GB ~6x
medium 769 M medium.en medium ~5 GB ~2x
large 1550 M N/A large ~10 GB 1x

Note: .en models are for English only, and you can use the Translate to English option from the other models