File size: 1,607 Bytes

f9a5b24
 
 
91251fa
f9a5b24
 
 
91251fa
f9a5b24
 
 
 
 
91251fa
96d549d
f9a5b24
91251fa
 
96d549d
 
 
 
 
f9a5b24
95784d3
f9a5b24
96d549d
91251fa
96d549d
 
91251fa
96d549d
 
 
 
 
 
 
 
91251fa
 
f9a5b24
96d549d
f9a5b24
 
96d549d
f9a5b24
91251fa
f9a5b24

# Faster Whisper Transcription Service

## Overview

This project uses the `faster_whisper` Python package to provide an API endpoint for audio transcription. It utilizes
OpenAI's Whisper model (large-v3) for accurate and efficient speech-to-text conversion. The service is designed to be
deployed on Hugging Face endpoints.

## Features

- **Efficient Transcription**: Utilizes the large-v3 Whisper model for high-quality transcription.
- **Multilingual Support**: Supports transcription in various languages, with default language set to German (de).
- **Segmented Output**: Returns transcribed text with segment IDs and timestamps for each transcribed segment.


## Usage

```python
import requests
import os

# Sample data dict with the link to the video file and the desired language for transcription
DATA = {
    "inputs": "<base64_encoded_audio>",
    "language": "de",
    "task": "transcribe"
}

HF_ACCESS_TOKEN = os.environ.get("HF_TRANSCRIPTION_ACCESS_TOKEN")
API_URL = os.environ.get("HF_TRANSCRIPTION_ENDPOINT")

HEADERS = {
    "Authorization": HF_ACCESS_TOKEN,
    "Content-Type": "application/json"
}

response = requests.post(API_URL, headers=HEADERS, json=DATA)
print(response)

```

## Logging

Logging is set up to debug level, providing detailed information during the transcription process, including the length
of decoded bytes, the progress of segments being transcribed, and a confirmation once the inference is completed.

## Deployment

This service is intended for deployment on Hugging Face endpoints. Ensure you follow Hugging Face's guidelines for
deploying model endpoints.