vinayakrevankar commited on
Commit
1a00545
·
1 Parent(s): c7bfded

minor changes and updated readme

Browse files
Files changed (2) hide show
  1. README.md +73 -0
  2. app.py +2 -2
README.md CHANGED
@@ -10,3 +10,76 @@ pinned: false
10
  ---
11
 
12
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
13
+
14
+ # QuickTranscribe
15
+
16
+ This is a Python-based web application that allows users to upload audio files or use a microphone to transcribe audio into text using Automatic Speech Recognition (ASR). The app also provides additional details like RAM utilization during the transcription process. It uses the **"openai/whisper-large-v3"** model from Hugging Face for transcription.
17
+
18
+ ## Features
19
+
20
+ - **Microphone and File Upload Support**: Users can transcribe audio from either a microphone or an uploaded audio file.
21
+ - **Local and API-based Transcription**: Option to use a local model or an API for transcription.
22
+ - **RAM Utilization Display**: Shows how much RAM was utilized during the transcription process.
23
+ - **Real-time Speech-to-Text Transcription**: Converts audio to text in real-time with time-tracking.
24
+ - **Model Used**: The application uses the **"openai/whisper-large-v3"** model for transcription, which is part of Hugging Face's library.
25
+
26
+ ## Installation
27
+
28
+ ### Prerequisites
29
+
30
+ - Python 3.x
31
+ - `psutil` library for RAM usage tracking
32
+ - `gradio` for the web interface
33
+ - `transformers` library for the ASR pipeline
34
+ - `huggingface_hub` for API access
35
+
36
+ You can install the required dependencies using pip:
37
+
38
+ ```bash
39
+ pip install psutil gradio transformers huggingface_hub
40
+ ```
41
+
42
+ ### Clone the repository
43
+
44
+ ```bash
45
+ git clone https://github.com/VenkateshRoshan/MLOPs-CaseStudy1.git
46
+ cd MLOPs-CaseStudy1
47
+ ```
48
+
49
+ ## Usage
50
+
51
+ ### Running the Application
52
+
53
+ To start the application, run the following command:
54
+
55
+ ```bash
56
+ python app.py
57
+ ```
58
+
59
+ This will launch a Gradio interface where you can choose to transcribe either using an uploaded audio file or the microphone input.
60
+
61
+ ### Options
62
+
63
+ - **Microphone Input**: Click on the "Microphone" tab to start recording and transcribe the audio.
64
+ - **Audio File Upload**: Use the "Audio File" tab to upload an audio file for transcription.
65
+ - **Use API**: Check the "Use API" checkbox if you want to use the Hugging Face API for transcription instead of the local pipeline.
66
+
67
+ ### Output
68
+
69
+ - **Transcribed Text**: The text transcribed from the uploaded or recorded audio will be displayed.
70
+ - **Time Taken**: The time taken for the transcription process is displayed.
71
+ - **RAM Utilization**: A text box shows the RAM usage details, including the amount of RAM used and the percentage of the total system RAM during the transcription process.
72
+
73
+ ## Example Output
74
+
75
+ Here’s an example of the displayed output:
76
+
77
+ - **Transcribed Text**: "This is an example transcription."
78
+ - **Time Taken**: "Using API it took: 12.34 seconds"
79
+ - **RAM Utilization**: "RAM Used: 0.56 GB (3.45%), Total RAM: 16.0 GB"
80
+
81
+ ## Future Enhancements
82
+
83
+ - **GPU Integration**: To address performance issues with CPU processing, integrating the product with Hugging Face’s GPU instances could significantly speed up transcription times, especially for longer audio files or real-time applications. Offering GPU as an option would provide a faster, more scalable solution for users who need high-speed transcription services.
84
+ - **Batch Processing and Caching**: Implementing batch processing or caching for repeated tasks (such as transcribing the same file multiple times) could reduce resource usage and improve performance. By grouping multiple audio files or requests together, the product could optimize processing times and reduce wait times for users.
85
+ - **Enhanced User Interface Features**: The user experience could be further enhanced by adding features like audio segmentation (to break up long audio files into smaller parts) and progress indicators during transcription. This would improve the usability of the product, especially for users transcribing lengthy recordings.
app.py CHANGED
@@ -67,7 +67,7 @@ mf_transcribe = gr.Interface(
67
  ],
68
  outputs=[gr.Textbox(label="Transcribed Text", type="text"),
69
  gr.Textbox(label="Time taken", type="text"),
70
- gr.Textbox(label="Utilization", type="text")
71
  ], # Placeholder for transcribed text and time taken
72
  title="Welcome to QuickTranscribe",
73
  description=(
@@ -85,7 +85,7 @@ file_transcribe = gr.Interface(
85
  ],
86
  outputs=[ gr.Textbox(label="Transcribed Text", type="text"),
87
  gr.Textbox(label="Time taken", type="text"),
88
- gr.Textbox(label="Utilization", type="text")
89
  ], # Placeholder for transcribed text and time taken
90
  title="Welcome to QuickTranscribe",
91
  description=(
 
67
  ],
68
  outputs=[gr.Textbox(label="Transcribed Text", type="text"),
69
  gr.Textbox(label="Time taken", type="text"),
70
+ gr.Textbox(label="RAM Utilization", type="text")
71
  ], # Placeholder for transcribed text and time taken
72
  title="Welcome to QuickTranscribe",
73
  description=(
 
85
  ],
86
  outputs=[ gr.Textbox(label="Transcribed Text", type="text"),
87
  gr.Textbox(label="Time taken", type="text"),
88
+ gr.Textbox(label="RAM Utilization", type="text")
89
  ], # Placeholder for transcribed text and time taken
90
  title="Welcome to QuickTranscribe",
91
  description=(