ManBib commited on
Commit
f9a5b24
·
1 Parent(s): ab0e749

updated README

Browse files
Files changed (2) hide show
  1. README.md +24 -61
  2. requirements.txt +0 -0
README.md CHANGED
@@ -1,22 +1,24 @@
1
- # Advanced Speech Processing with faster-whisper
 
2
 
3
- Welcome to the advanced speech processing utility leveraging the powerful Whisper large-v2 model for the CTranslate2
4
- framework. This tool is designed for high-performance speech recognition and processing, supporting a wide array of
5
- languages and the capability to handle video inputs for slide detection and audio transcription.
6
 
7
- ## Features
 
 
8
 
9
- - **Language Support**: Extensive language support covering major global languages for speech recognition tasks.
10
- - **Video Processing**: Download MP4 files from links and extract audio content for transcription.
11
- - **Slide Detection**: Detect and sort presentation slides from video lectures or meetings.
12
- - **Audio Transcription**: Leverage the Whisper large-v2 model to transcribe audio content with high accuracy.
13
 
14
- ## Getting Started
 
 
 
 
15
 
16
- To begin using this utility, set up the `WhisperModel` from the `faster_whisper` package with the provided language
17
- configurations. The `EndpointHandler` class is your main interface for processing the data.
18
 
19
- ### Example Usage
20
 
21
  ```python
22
  import requests
@@ -24,11 +26,9 @@ import os
24
 
25
  # Sample data dict with the link to the video file and the desired language for transcription
26
  DATA = {
27
- "inputs": "<base64_encoded_audio_string>",
28
- "link": "<your_mp4_video_link>",
29
- "language": "en", # Choose from supported languages
30
- "task": "transcribe",
31
- "type": "audio" # Use "link" for video files
32
  }
33
 
34
  HF_ACCESS_TOKEN = os.environ.get("HF_TRANSCRIPTION_ACCESS_TOKEN")
@@ -45,49 +45,12 @@ print(response)
45
  # The response will contain transcribed audio and detected slides if a video link was provided
46
  ```
47
 
48
- ### Processing Video Files
49
-
50
- To process video files, the `process_video` function downloads the MP4 file, extracts the audio, and passes it to the
51
- Whisper model for transcription. It also utilizes the `Detector` and `SlideSorter` classes to identify and sort
52
- presentation slides within the video.
53
 
54
- ### Error Handling
55
-
56
- Comprehensive logging and error handling are in place to ensure you're informed of each step's success or failure.
57
-
58
- ## Installation
59
-
60
- Ensure that you have the following dependencies installed:
61
-
62
- ```plaintext
63
- opencv-python~=4.8.1.78
64
- numpy~=1.26.1
65
- Pillow~=10.0.1
66
- tqdm~=4.66.1
67
- requests~=2.31.0
68
- moviepy~=1.0.3
69
- scipy~=1.11.3
70
- ```
71
 
72
- Install them using pip with the provided `requirements.txt` file:
73
-
74
- ```bash
75
- pip install -r requirements.txt
76
- ```
77
-
78
- ## Languages Supported
79
-
80
- This tool supports a plethora of languages, making it highly versatile for global applications. The full list of
81
- supported languages can be found in the `language` section of the old README.
82
-
83
- ## License
84
-
85
- This project is available under the MIT license.
86
-
87
- ## More Information
88
-
89
- For more information about the original Whisper large-v2 model, please refer to
90
- its [model card on Hugging Face](https://huggingface.co/openai/whisper-large-v2).
91
-
92
- ---
93
 
 
 
 
1
+ Sure, I'd be happy to help you create a README for your project. A README file is essential for explaining your project,
2
+ how to set it up, and how to use it. Here's a basic template for your project:
3
 
4
+ ---
 
 
5
 
6
+ # Faster Whisper Transcription Service
7
+
8
+ ## Overview
9
 
10
+ This project uses the `faster_whisper` Python package to provide an API endpoint for audio transcription. It utilizes
11
+ OpenAI's Whisper model (large-v3) for accurate and efficient speech-to-text conversion. The service is designed to be
12
+ deployed on Hugging Face endpoints.
 
13
 
14
+ ## Features
15
+
16
+ - **Efficient Transcription**: Utilizes the large-v3 Whisper model for high-quality transcription.
17
+ - **Multilingual Support**: Supports transcription in various languages, with default language set to German (de).
18
+ - **Segmented Output**: Returns transcribed text with segment IDs and timestamps for each transcribed segment.
19
 
 
 
20
 
21
+ ## Usage
22
 
23
  ```python
24
  import requests
 
26
 
27
  # Sample data dict with the link to the video file and the desired language for transcription
28
  DATA = {
29
+ "inputs": "<base64_encoded_audio>",
30
+ "language": "en",
31
+ "task": "transcribe"
 
 
32
  }
33
 
34
  HF_ACCESS_TOKEN = os.environ.get("HF_TRANSCRIPTION_ACCESS_TOKEN")
 
45
  # The response will contain transcribed audio and detected slides if a video link was provided
46
  ```
47
 
48
+ ## Logging
 
 
 
 
49
 
50
+ Logging is set up to debug level, providing detailed information during the transcription process, including the length
51
+ of decoded bytes, the progress of segments being transcribed, and a confirmation once the inference is completed.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
 
53
+ ## Deployment
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
 
55
+ This service is intended for deployment on Hugging Face endpoints. Ensure you follow Hugging Face's guidelines for
56
+ deploying model endpoints.
requirements.txt CHANGED
Binary files a/requirements.txt and b/requirements.txt differ