|
# Faster Whisper Transcription Service |
|
|
|
## Overview |
|
|
|
This project uses the `faster_whisper` Python package to provide an API endpoint for audio transcription. It utilizes |
|
OpenAI's Whisper model (large-v3) for accurate and efficient speech-to-text conversion. The service is designed to be |
|
deployed on Hugging Face endpoints. |
|
|
|
## Features |
|
|
|
- **Efficient Transcription**: Utilizes the large-v3 Whisper model for high-quality transcription. |
|
- **Multilingual Support**: Supports transcription in various languages, with default language set to German (de). |
|
- **Segmented Output**: Returns transcribed text with segment IDs and timestamps for each transcribed segment. |
|
|
|
|
|
## Usage |
|
|
|
```python |
|
import requests |
|
import os |
|
|
|
# Sample data dict with the link to the video file and the desired language for transcription |
|
DATA = { |
|
"inputs": "<base64_encoded_audio>", |
|
"language": "de", |
|
"task": "transcribe" |
|
} |
|
|
|
HF_ACCESS_TOKEN = os.environ.get("HF_TRANSCRIPTION_ACCESS_TOKEN") |
|
API_URL = os.environ.get("HF_TRANSCRIPTION_ENDPOINT") |
|
|
|
HEADERS = { |
|
"Authorization": HF_ACCESS_TOKEN, |
|
"Content-Type": "application/json" |
|
} |
|
|
|
response = requests.post(API_URL, headers=HEADERS, json=DATA) |
|
print(response) |
|
|
|
``` |
|
|
|
## Logging |
|
|
|
Logging is set up to debug level, providing detailed information during the transcription process, including the length |
|
of decoded bytes, the progress of segments being transcribed, and a confirmation once the inference is completed. |
|
|
|
## Deployment |
|
|
|
This service is intended for deployment on Hugging Face endpoints. Ensure you follow Hugging Face's guidelines for |
|
deploying model endpoints. |
|
|