ArunSamespace commited on
Commit
9314fc1
·
verified ·
1 Parent(s): ce5596c

Upload 41 files

Browse files
Files changed (41) hide show
  1. Dockerfile +43 -0
  2. LICENSE +201 -0
  3. README.md +44 -4
  4. app/.DS_Store +0 -0
  5. app/app.py +40 -0
  6. app/modules/.DS_Store +0 -0
  7. app/modules/__init__.py +34 -0
  8. app/modules/__pycache__/__init__.cpython-310.pyc +0 -0
  9. app/modules/__pycache__/__init__.cpython-38.pyc +0 -0
  10. app/modules/__pycache__/__init__.cpython-39.pyc +0 -0
  11. app/modules/__pycache__/utils.cpython-310.pyc +0 -0
  12. app/modules/__pycache__/utils.cpython-38.pyc +0 -0
  13. app/modules/__pycache__/utils.cpython-39.pyc +0 -0
  14. app/modules/audio/__init__.py +18 -0
  15. app/modules/audio/__pycache__/__init__.cpython-38.pyc +0 -0
  16. app/modules/emotion/__init__.py +24 -0
  17. app/modules/emotion/__pycache__/__init__.cpython-310.pyc +0 -0
  18. app/modules/emotion/__pycache__/__init__.cpython-38.pyc +0 -0
  19. app/modules/emotion/__pycache__/__init__.cpython-39.pyc +0 -0
  20. app/modules/transcription/__init__.py +30 -0
  21. app/modules/transcription/__pycache__/__init__.cpython-38.pyc +0 -0
  22. app/modules/utils.py +20 -0
  23. app/static/audio_to_text.css +85 -0
  24. app/static/audiodisplay.js +18 -0
  25. app/static/footer.js +8 -0
  26. app/static/footer_file.css +51 -0
  27. app/static/header.js +8 -0
  28. app/static/header_file.css +77 -0
  29. app/static/main.css +83 -0
  30. app/static/main.js +184 -0
  31. app/static/recorder.js +118 -0
  32. app/static/recorderWorker.js +161 -0
  33. app/static/text_to_speech.png +0 -0
  34. app/templates/audio_to_text.html +46 -0
  35. app/templates/footer.html +11 -0
  36. app/templates/header.html +22 -0
  37. app/templates/index.html +38 -0
  38. app/tmp/audio.wav +0 -0
  39. app/tmp/test.mp3 +0 -0
  40. app/wsgi.py +4 -0
  41. requirements.txt +7 -0
Dockerfile ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.8.10-slim
2
+
3
+ RUN export DEBIAN_FRONTEND=noninteractive \
4
+ && apt-get -qq update \
5
+ && apt install software-properties-common -y \
6
+ && apt-get install build-essential -y \
7
+ && python3 --version \
8
+ && apt update \
9
+ && apt install ffmpeg -y \
10
+ && rm -rf /var/lib/apt/lists/* \
11
+ && apt update \
12
+ && apt install nginx -y
13
+
14
+ ### Set up user with permissions
15
+ # Set up a new user named "user" with user ID 1000
16
+ RUN useradd -m -u 1000 user
17
+
18
+ # Switch to the "user" user
19
+ USER user
20
+
21
+ # Set home to the user's home directory
22
+ ENV HOME=/home/user \
23
+ PATH=/home/user/.local/bin:$PATH
24
+
25
+ # Set the working directory to the user's home directory
26
+ WORKDIR $HOME/app
27
+
28
+ ### Set up app-specific content
29
+ COPY requirements.txt requirements.txt
30
+ RUN pip3 install -r requirements.txt
31
+ RUN pip3 install gunicorn
32
+
33
+ # Copy the current directory contents into the container at $HOME/app setting the owner to the user
34
+ COPY --chown=user app $HOME/app
35
+
36
+ ### Update permissions for the app
37
+ USER root
38
+ RUN chmod 777 ~/app/*
39
+ USER user
40
+
41
+ # RUN python3 server.py
42
+ ENTRYPOINT ["python3", "app.py"]
43
+ # ENTRYPOINT ["gunicorn", "--timeout 600", "wsgi:app"]
LICENSE ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction,
10
+ and distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity authorized by
13
+ the copyright owner that is granting the License.
14
+
15
+ "Legal Entity" shall mean the union of the acting entity and all
16
+ other entities that control, are controlled by, or are under common
17
+ control with that entity. For the purposes of this definition,
18
+ "control" means (i) the power, direct or indirect, to cause the
19
+ direction or management of such entity, whether by contract or
20
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
21
+ outstanding shares, or (iii) beneficial ownership of such entity.
22
+
23
+ "You" (or "Your") shall mean an individual or Legal Entity
24
+ exercising permissions granted by this License.
25
+
26
+ "Source" form shall mean the preferred form for making modifications,
27
+ including but not limited to software source code, documentation
28
+ source, and configuration files.
29
+
30
+ "Object" form shall mean any form resulting from mechanical
31
+ transformation or translation of a Source form, including but
32
+ not limited to compiled object code, generated documentation,
33
+ and conversions to other media types.
34
+
35
+ "Work" shall mean the work of authorship, whether in Source or
36
+ Object form, made available under the License, as indicated by a
37
+ copyright notice that is included in or attached to the work
38
+ (an example is provided in the Appendix below).
39
+
40
+ "Derivative Works" shall mean any work, whether in Source or Object
41
+ form, that is based on (or derived from) the Work and for which the
42
+ editorial revisions, annotations, elaborations, or other modifications
43
+ represent, as a whole, an original work of authorship. For the purposes
44
+ of this License, Derivative Works shall not include works that remain
45
+ separable from, or merely link (or bind by name) to the interfaces of,
46
+ the Work and Derivative Works thereof.
47
+
48
+ "Contribution" shall mean any work of authorship, including
49
+ the original version of the Work and any modifications or additions
50
+ to that Work or Derivative Works thereof, that is intentionally
51
+ submitted to Licensor for inclusion in the Work by the copyright owner
52
+ or by an individual or Legal Entity authorized to submit on behalf of
53
+ the copyright owner. For the purposes of this definition, "submitted"
54
+ means any form of electronic, verbal, or written communication sent
55
+ to the Licensor or its representatives, including but not limited to
56
+ communication on electronic mailing lists, source code control systems,
57
+ and issue tracking systems that are managed by, or on behalf of, the
58
+ Licensor for the purpose of discussing and improving the Work, but
59
+ excluding communication that is conspicuously marked or otherwise
60
+ designated in writing by the copyright owner as "Not a Contribution."
61
+
62
+ "Contributor" shall mean Licensor and any individual or Legal Entity
63
+ on behalf of whom a Contribution has been received by Licensor and
64
+ subsequently incorporated within the Work.
65
+
66
+ 2. Grant of Copyright License. Subject to the terms and conditions of
67
+ this License, each Contributor hereby grants to You a perpetual,
68
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69
+ copyright license to reproduce, prepare Derivative Works of,
70
+ publicly display, publicly perform, sublicense, and distribute the
71
+ Work and such Derivative Works in Source or Object form.
72
+
73
+ 3. Grant of Patent License. Subject to the terms and conditions of
74
+ this License, each Contributor hereby grants to You a perpetual,
75
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76
+ (except as stated in this section) patent license to make, have made,
77
+ use, offer to sell, sell, import, and otherwise transfer the Work,
78
+ where such license applies only to those patent claims licensable
79
+ by such Contributor that are necessarily infringed by their
80
+ Contribution(s) alone or by combination of their Contribution(s)
81
+ with the Work to which such Contribution(s) was submitted. If You
82
+ institute patent litigation against any entity (including a
83
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
84
+ or a Contribution incorporated within the Work constitutes direct
85
+ or contributory patent infringement, then any patent licenses
86
+ granted to You under this License for that Work shall terminate
87
+ as of the date such litigation is filed.
88
+
89
+ 4. Redistribution. You may reproduce and distribute copies of the
90
+ Work or Derivative Works thereof in any medium, with or without
91
+ modifications, and in Source or Object form, provided that You
92
+ meet the following conditions:
93
+
94
+ (a) You must give any other recipients of the Work or
95
+ Derivative Works a copy of this License; and
96
+
97
+ (b) You must cause any modified files to carry prominent notices
98
+ stating that You changed the files; and
99
+
100
+ (c) You must retain, in the Source form of any Derivative Works
101
+ that You distribute, all copyright, patent, trademark, and
102
+ attribution notices from the Source form of the Work,
103
+ excluding those notices that do not pertain to any part of
104
+ the Derivative Works; and
105
+
106
+ (d) If the Work includes a "NOTICE" text file as part of its
107
+ distribution, then any Derivative Works that You distribute must
108
+ include a readable copy of the attribution notices contained
109
+ within such NOTICE file, excluding those notices that do not
110
+ pertain to any part of the Derivative Works, in at least one
111
+ of the following places: within a NOTICE text file distributed
112
+ as part of the Derivative Works; within the Source form or
113
+ documentation, if provided along with the Derivative Works; or,
114
+ within a display generated by the Derivative Works, if and
115
+ wherever such third-party notices normally appear. The contents
116
+ of the NOTICE file are for informational purposes only and
117
+ do not modify the License. You may add Your own attribution
118
+ notices within Derivative Works that You distribute, alongside
119
+ or as an addendum to the NOTICE text from the Work, provided
120
+ that such additional attribution notices cannot be construed
121
+ as modifying the License.
122
+
123
+ You may add Your own copyright statement to Your modifications and
124
+ may provide additional or different license terms and conditions
125
+ for use, reproduction, or distribution of Your modifications, or
126
+ for any such Derivative Works as a whole, provided Your use,
127
+ reproduction, and distribution of the Work otherwise complies with
128
+ the conditions stated in this License.
129
+
130
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
131
+ any Contribution intentionally submitted for inclusion in the Work
132
+ by You to the Licensor shall be under the terms and conditions of
133
+ this License, without any additional terms or conditions.
134
+ Notwithstanding the above, nothing herein shall supersede or modify
135
+ the terms of any separate license agreement you may have executed
136
+ with Licensor regarding such Contributions.
137
+
138
+ 6. Trademarks. This License does not grant permission to use the trade
139
+ names, trademarks, service marks, or product names of the Licensor,
140
+ except as required for reasonable and customary use in describing the
141
+ origin of the Work and reproducing the content of the NOTICE file.
142
+
143
+ 7. Disclaimer of Warranty. Unless required by applicable law or
144
+ agreed to in writing, Licensor provides the Work (and each
145
+ Contributor provides its Contributions) on an "AS IS" BASIS,
146
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147
+ implied, including, without limitation, any warranties or conditions
148
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149
+ PARTICULAR PURPOSE. You are solely responsible for determining the
150
+ appropriateness of using or redistributing the Work and assume any
151
+ risks associated with Your exercise of permissions under this License.
152
+
153
+ 8. Limitation of Liability. In no event and under no legal theory,
154
+ whether in tort (including negligence), contract, or otherwise,
155
+ unless required by applicable law (such as deliberate and grossly
156
+ negligent acts) or agreed to in writing, shall any Contributor be
157
+ liable to You for damages, including any direct, indirect, special,
158
+ incidental, or consequential damages of any character arising as a
159
+ result of this License or out of the use or inability to use the
160
+ Work (including but not limited to damages for loss of goodwill,
161
+ work stoppage, computer failure or malfunction, or any and all
162
+ other commercial damages or losses), even if such Contributor
163
+ has been advised of the possibility of such damages.
164
+
165
+ 9. Accepting Warranty or Additional Liability. While redistributing
166
+ the Work or Derivative Works thereof, You may choose to offer,
167
+ and charge a fee for, acceptance of support, warranty, indemnity,
168
+ or other liability obligations and/or rights consistent with this
169
+ License. However, in accepting such obligations, You may act only
170
+ on Your own behalf and on Your sole responsibility, not on behalf
171
+ of any other Contributor, and only if You agree to indemnify,
172
+ defend, and hold each Contributor harmless for any liability
173
+ incurred by, or claims asserted against, such Contributor by reason
174
+ of your accepting any such warranty or additional liability.
175
+
176
+ END OF TERMS AND CONDITIONS
177
+
178
+ APPENDIX: How to apply the Apache License to your work.
179
+
180
+ To apply the Apache License to your work, attach the following
181
+ boilerplate notice, with the fields enclosed by brackets "[]"
182
+ replaced with your own identifying information. (Don't include
183
+ the brackets!) The text should be enclosed in the appropriate
184
+ comment syntax for the file format. We also recommend that a
185
+ file or class name and description of purpose be included on the
186
+ same "printed page" as the copyright notice for easier
187
+ identification within third-party archives.
188
+
189
+ Copyright [yyyy] [name of copyright owner]
190
+
191
+ Licensed under the Apache License, Version 2.0 (the "License");
192
+ you may not use this file except in compliance with the License.
193
+ You may obtain a copy of the License at
194
+
195
+ http://www.apache.org/licenses/LICENSE-2.0
196
+
197
+ Unless required by applicable law or agreed to in writing, software
198
+ distributed under the License is distributed on an "AS IS" BASIS,
199
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200
+ See the License for the specific language governing permissions and
201
+ limitations under the License.
README.md CHANGED
@@ -1,11 +1,51 @@
1
  ---
2
- title: SpeechEmotionV2
3
- emoji: 🌖
4
- colorFrom: blue
5
- colorTo: pink
6
  sdk: docker
7
  pinned: false
8
  license: apache-2.0
9
  ---
10
 
11
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Test
3
+ emoji: 👀
4
+ colorFrom: yellow
5
+ colorTo: purple
6
  sdk: docker
7
  pinned: false
8
  license: apache-2.0
9
  ---
10
 
11
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
12
+
13
+ # AI-Powered Mood-Based Music Recommendation System
14
+
15
+ **Welcome to the AI-Powered Mood-Based Music Recommendation System**, a captivating project developed using Python and its frameworks. This innovative system aims to offer personalized music recommendations based on the user's current mood, creating an immersive and enjoyable music experience.
16
+
17
+ ## Overview
18
+
19
+ The project opens with an engaging audio prompt asking users, *"How are you feeling today?"* Utilizing the Whisper library, the user's response, is transcribed into text. An advanced AI model trained for emotion classification analyzes this text to precisely identify the user's emotional state.
20
+
21
+ ## Features
22
+
23
+ - **Audio Prompt:** Engage with an audio message that initiates the interaction.
24
+ - **Speech-to-Text:** Whisper library converts spoken responses into text.
25
+ - **Emotion Classification:** AI model accurately classifies user emotions.
26
+ - **Music Recommendations:** Based on emotions, receive tailored song recommendations.
27
+ - **User-Friendly Interface:** Python frameworks create an intuitive web-based UI.
28
+
29
+ ## Commands
30
+
31
+ ### Build
32
+ ```shell
33
+ docker build --tag mood-based --file Dockerfile .
34
+ ```
35
+
36
+
37
+ ### Run
38
+ ```shell
39
+ docker run -dit --name mood-based --network host -v /Users/arunaddagatla/Arun/Adarsh/SpeechEmotionV2/app/:/app/ mood-based:latest
40
+ ```
41
+
42
+ ### To go inside docker
43
+
44
+ ```shell
45
+ docker exec -it mood-based bash
46
+ ```
47
+
48
+ ### Stop and remove docker
49
+ ```shell
50
+ docker stop mood-based && docker rm mood-based
51
+ ```
app/.DS_Store ADDED
Binary file (6.15 kB). View file
 
app/app.py ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import uuid
3
+
4
+ from flask import Flask, flash, render_template, request
5
+ from flask_cors import CORS
6
+ from modules import Module
7
+
8
+ model = Module()
9
+
10
+
11
+ app = Flask(__name__)
12
+ CORS(app)
13
+
14
+ app.secret_key = "Testing"
15
+
16
+ @app.route('/')
17
+ def index():
18
+ flash(" Welcome to My Website")
19
+ return render_template('index.html')
20
+
21
+ @app.route('/audio_to_text/')
22
+ def audio_to_text():
23
+ flash(" Press Start to start recording audio and press Stop to end recording audio")
24
+ return render_template('audio_to_text.html')
25
+
26
+ @app.route('/audio', methods=['POST'])
27
+ def audio():
28
+ try:
29
+ output_file = f"./tmp/{uuid.uuid4()}.wav"
30
+ open(output_file, 'wb').write(request.data)
31
+ text, emotion = model.predict(audio_path=output_file)
32
+ os.remove(output_file)
33
+ return_text = f" Transcription: {text} <br> Emotion: {emotion} "
34
+ except Exception:
35
+ return_text = " Sorry!!!! Voice not Detected "
36
+ return return_text
37
+
38
+
39
+ if __name__ == "__main__":
40
+ app.run(debug=True, port=7860, host='0.0.0.0')
app/modules/.DS_Store ADDED
Binary file (6.15 kB). View file
 
app/modules/__init__.py ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import time
3
+
4
+ from modules.emotion import Emotion
5
+ from modules.transcription import Transcription
6
+
7
+ transcription_model = "tiny.en"
8
+ emotion_model = "joeddav/distilbert-base-uncased-go-emotions-student"
9
+
10
+ transcription_obj = Transcription(model_name=transcription_model)
11
+ emotion_obj = Emotion(model_name=emotion_model)
12
+
13
+ class Module:
14
+
15
+ def predict(self, audio_path: str) -> str:
16
+ """Loads audio, gets transcription and detects emotion
17
+
18
+ Args:
19
+ audio_path (str): path to the audio file
20
+
21
+ Returns:
22
+ str: emotion
23
+ """
24
+ print("Getting transcription...")
25
+ start_time = time.time()
26
+ if text := transcription_obj.transcribe(audio_path=audio_path):
27
+ print("Text: ", text, time.time() - start_time)
28
+
29
+ start_time = time.time()
30
+ emotion = emotion_obj.detect_emotion(text=text)
31
+ print("Emotion: ", emotion, time.time() - start_time)
32
+ return text, emotion
33
+ return None
34
+
app/modules/__pycache__/__init__.cpython-310.pyc ADDED
Binary file (1.59 kB). View file
 
app/modules/__pycache__/__init__.cpython-38.pyc ADDED
Binary file (1.16 kB). View file
 
app/modules/__pycache__/__init__.cpython-39.pyc ADDED
Binary file (1.21 kB). View file
 
app/modules/__pycache__/utils.cpython-310.pyc ADDED
Binary file (707 Bytes). View file
 
app/modules/__pycache__/utils.cpython-38.pyc ADDED
Binary file (678 Bytes). View file
 
app/modules/__pycache__/utils.cpython-39.pyc ADDED
Binary file (726 Bytes). View file
 
app/modules/audio/__init__.py ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Any
2
+
3
+ import whisper
4
+
5
+
6
+ class Audio:
7
+ @classmethod
8
+ def load_audio(cls, audio_path: str) -> Any:
9
+ """Loads audio file from the disk
10
+
11
+ Args:
12
+ audio_path (str): path of the audio file
13
+
14
+ Returns:
15
+ Any: loaded audio file in numbers
16
+ """
17
+ return whisper.load_audio(audio_path)
18
+
app/modules/audio/__pycache__/__init__.cpython-38.pyc ADDED
Binary file (693 Bytes). View file
 
app/modules/emotion/__init__.py ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Any
2
+
3
+ from modules.utils import Pipeline, load_model
4
+
5
+
6
+ class Emotion:
7
+ task: str = "text-classification"
8
+
9
+ def __init__(self, model_name: str) -> None:
10
+ # model_name: str = "/models/distilbert-base-uncased-go-emotions-student/"
11
+ print("Loading emotion model...")
12
+ self.emotion_model: Pipeline = load_model(task=self.task, model=model_name)
13
+ print("Loaded emotion model")
14
+
15
+ def detect_emotion(self, text: str) -> str:
16
+ """Detects emotion of the given text
17
+
18
+ Args:
19
+ text (str): text
20
+
21
+ Returns:
22
+ str: emotion
23
+ """
24
+ return self.emotion_model(text)[0]['label']
app/modules/emotion/__pycache__/__init__.cpython-310.pyc ADDED
Binary file (1.06 kB). View file
 
app/modules/emotion/__pycache__/__init__.cpython-38.pyc ADDED
Binary file (1.04 kB). View file
 
app/modules/emotion/__pycache__/__init__.cpython-39.pyc ADDED
Binary file (1.09 kB). View file
 
app/modules/transcription/__init__.py ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ from faster_whisper import WhisperModel
3
+
4
+
5
+ class Transcription:
6
+
7
+ def __init__(self, model_name: str) -> None:
8
+ print("Loading whisper model...")
9
+ self.whisper_model = self.load_whisper(model_id=model_name)
10
+ print("Loaded whisper model")
11
+
12
+ def load_whisper(self, model_id: str):
13
+
14
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
15
+ torch_dtype = "float16" if torch.cuda.is_available() else "float32"
16
+
17
+ return WhisperModel(model_id, device=device, compute_type=torch_dtype)
18
+
19
+ def transcribe(self, audio_path: str) -> str:
20
+ """Transcribes the given audio data
21
+
22
+ Args:
23
+ audio_path (str): audio path
24
+
25
+ Returns:
26
+ str: text
27
+ """
28
+ segments, info = self.whisper_model.transcribe(audio_path, language="en")
29
+ return "".join(segment.text for segment in segments)
30
+
app/modules/transcription/__pycache__/__init__.cpython-38.pyc ADDED
Binary file (1.48 kB). View file
 
app/modules/utils.py ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ from transformers import pipeline
3
+ from transformers.pipelines.base import Pipeline
4
+
5
+
6
+ def load_model(task: str, model: str) -> Pipeline:
7
+ """Loads the given transformers model based on the given task
8
+
9
+ Args:
10
+ task (str): NLP task
11
+ model (str): transformers model
12
+
13
+ Returns:
14
+ Pipeline: transformers pipeline object
15
+ """
16
+ return pipeline(
17
+ task=task,
18
+ model=model,
19
+ device = 0 if torch.cuda.is_available() else -1
20
+ )
app/static/audio_to_text.css ADDED
@@ -0,0 +1,85 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ .column1 {
3
+ float: left;
4
+ width: 30%;
5
+ height: 400px; /* Should be removed. Only for demonstration */
6
+ background-color: #fff;
7
+ }
8
+
9
+ .column2 {
10
+ float: left;
11
+ width: 70%;
12
+ height: 400px; /* Should be removed. Only for demonstration */
13
+ background-color: #0f4da6;
14
+ }
15
+
16
+ /* Clear floats after the columns */
17
+ .row:after {
18
+
19
+ content: "";
20
+ display: table;
21
+ clear: both;
22
+ }
23
+
24
+ .row{
25
+ margin-left: 10%;
26
+ margin-right: 10%;
27
+ margin-bottom: 3%;
28
+ background-color: #060073;
29
+ }
30
+ /* Responsive layout - makes the two columns stack on top of each other instead of next to each other */
31
+ @media screen and (max-width: 600px) {
32
+ .column1 {
33
+ width: 100%;
34
+ }
35
+ }
36
+
37
+ @media screen and (max-width: 600px) {
38
+ .column2 {
39
+ width: 100%;
40
+ }
41
+ }
42
+
43
+ #imgInp {
44
+ opacity: 0;
45
+ position: absolute;
46
+ z-index: -1;
47
+ }
48
+
49
+ label {
50
+ cursor: pointer;
51
+ /* Style as you please, it will become the visible UI component. */
52
+ padding-left: 10%;
53
+ padding-right: 10%;
54
+ padding-top: 2%;
55
+ padding-bottom: 2%;
56
+ font-size: 100%;
57
+ color: #fff;
58
+ border-radius: 25px;
59
+ background-color: #0f4da6;
60
+
61
+ }
62
+
63
+ #blah{
64
+ max-width: 60%;
65
+ max-height: 60%;
66
+ }
67
+
68
+ #image_div1{
69
+ max-width: 70%;
70
+ max-height: 70%;
71
+ }
72
+
73
+ #stop,#start{
74
+ cursor: pointer;
75
+ /* Style as you please, it will become the visible UI component. */
76
+ padding-left: 10%;
77
+ padding-right: 10%;
78
+ padding-top: 2%;
79
+ padding-bottom: 2%;
80
+ font-size: 100%;
81
+ color: #fff;
82
+ border-radius: 25px;
83
+ background-color: #0f4da6;
84
+ margin-top: 10%;
85
+ }
app/static/audiodisplay.js ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ function drawBuffer( width, height, context, data ) {
2
+ var step = Math.ceil( data.length / width );
3
+ var amp = height / 2;
4
+ context.fillStyle = "silver";
5
+ context.clearRect(0,0,width,height);
6
+ for(var i=0; i < width; i++){
7
+ var min = 1.0;
8
+ var max = -1.0;
9
+ for (j=0; j<step; j++) {
10
+ var datum = data[(i*step)+j];
11
+ if (datum < min)
12
+ min = datum;
13
+ if (datum > max)
14
+ max = datum;
15
+ }
16
+ context.fillRect(i,(1+min)*amp,1,Math.max(1,(max-min)*amp));
17
+ }
18
+ }
app/static/footer.js ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ function myFunction() {
2
+ var x = document.getElementById("myFootnav");
3
+ if (x.className === "footnov") {
4
+ x.className += " responsive";
5
+ } else {
6
+ x.className = "footnov";
7
+ }
8
+ }
app/static/footer_file.css ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ .footnov {
3
+ overflow: hidden;
4
+ background-color: black;
5
+ }
6
+
7
+ .footnov a {
8
+ float: left;
9
+ display: block;
10
+ color: #f2f2f2;
11
+ text-align: center;
12
+ padding: 14px 16px;
13
+ text-decoration: none;
14
+ font-size: 17px;
15
+ }
16
+
17
+ .footnov a:hover {
18
+ background-color: #ddd;
19
+ color: black;
20
+ }
21
+
22
+ .footnov a.active {
23
+ background-color:rgb(80, 61, 226);
24
+ color: white;
25
+ }
26
+
27
+ .footnov .icon {
28
+ display: none;
29
+ }
30
+
31
+ @media screen and (max-width: 100px) {
32
+ .footnov a:not(:first-child) {display: none;}
33
+ .footnov a.icon {
34
+ float: right;
35
+ display: block;
36
+ }
37
+ }
38
+
39
+ @media screen and (max-width: 100px) {
40
+ .footnov.responsive {position: relative;}
41
+ .footnov.responsive .icon {
42
+ position: absolute;
43
+ right: 0;
44
+ top: 0;
45
+ }
46
+ .footnov.responsive a {
47
+ float: none;
48
+ display: block;
49
+ text-align: left;
50
+ }
51
+ }
app/static/header.js ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ function myFunction() {
2
+ var x = document.getElementById("mytopnav");
3
+ if (x.className === "topnav") {
4
+ x.className += " responsive";
5
+ } else {
6
+ x.className = "topnav";
7
+ }
8
+ }
app/static/header_file.css ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ body {
2
+ margin: 0;
3
+ font-family: Arial, Helvetica, sans-serif;
4
+ }
5
+
6
+ .topnav {
7
+ overflow: hidden;
8
+ background-color: black;
9
+ }
10
+
11
+ .topnav a {
12
+ float: left;
13
+ display: block;
14
+ color: #f2f2f2;
15
+ text-align: center;
16
+ padding: 14px 16px;
17
+ text-decoration: none;
18
+ font-size: 17px;
19
+ }
20
+
21
+ .topnav a:hover {
22
+ background-color: #ddd;
23
+ color: black;
24
+ }
25
+
26
+ .topnav a.active {
27
+ background-color: rgb(80, 61, 226);
28
+ color: white;
29
+ }
30
+
31
+ .topnav .icon {
32
+ display: none;
33
+ }
34
+
35
+ @media screen and (max-width: 600px) {
36
+ .topnav a:not(:first-child) {display: none;}
37
+ .topnav a.icon {
38
+ float: right;
39
+ display: block;
40
+ }
41
+ }
42
+
43
+ @media screen and (max-width: 600px) {
44
+ .topnav.responsive {position: relative;}
45
+ .topnav.responsive .icon {
46
+ position: absolute;
47
+ right: 0;
48
+ top: 0;
49
+ }
50
+ .topnav.responsive a {
51
+ float: none;
52
+ display: block;
53
+ text-align: left;
54
+ }
55
+ }
56
+
57
+
58
+ .alert {
59
+ padding: 20px;
60
+ background-color: #f44336;
61
+ color: white;
62
+ }
63
+
64
+ .closebtn {
65
+ margin-left: 15px;
66
+ color: white;
67
+ font-weight: bold;
68
+ float: right;
69
+ font-size: 22px;
70
+ line-height: 20px;
71
+ cursor: pointer;
72
+ transition: 0.3s;
73
+ }
74
+
75
+ .closebtn:hover {
76
+ color: black;
77
+ }
app/static/main.css ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #loader{
2
+ position:relative;
3
+ margin:0 auto;
4
+ clear:left;
5
+ height:auto;
6
+ z-index: 0;
7
+ text-align:center;
8
+ }
9
+
10
+
11
+ .glow {
12
+ font-size: 50px;
13
+ color: #fff;
14
+ text-align: center;
15
+ -webkit-animation: glow 1s ease-in-out infinite alternate;
16
+ -moz-animation: glow 1s ease-in-out infinite alternate;
17
+ animation: glow 1s ease-in-out infinite alternate;
18
+ }
19
+
20
+ @-webkit-keyframes glow {
21
+ from {
22
+ text-shadow: 0 0 5px #fff, 0 0 10px #fff, 0 0 15px #660073, 0 0 20px #660073, 0 0 25px #660073, 0 0 30px #060073, 0 0 35px #060073;
23
+ }
24
+
25
+ to {
26
+ text-shadow: 0 0 10px #fff, 0 0 15px #0f4da6, 0 0 20px #0f4da6, 0 0 25px #0f4da6, 0 0 30px #0f4da6, 0 0 35px #0f4da6, 0 0 40px #0f4da6;
27
+ }
28
+ }
29
+
30
+
31
+
32
+ .registerbtn {
33
+ border-radius: 25px;
34
+ position:relative;
35
+ color: white;
36
+ padding: 20px 20px;
37
+ margin: 0.9% 0;
38
+ border: none;
39
+ cursor: pointer;
40
+ font-size: 150%;
41
+ width: 40%;
42
+ opacity: 0.9;
43
+ }
44
+ .button1{
45
+ background-color: green;
46
+ /* margin-left: 3%; */
47
+ }
48
+ .button2{
49
+ background-color: blue;
50
+ }
51
+ .button3{
52
+ background-color: red;
53
+ }
54
+ .button4{
55
+ background-color:cornflowerblue;
56
+ }
57
+ .button5{
58
+ background-color:blueviolet;
59
+ }
60
+ .button span {
61
+ cursor: pointer;
62
+ display: inline-block;
63
+ position: relative;
64
+ transition: 0.5s;
65
+ }
66
+
67
+ .registerbtn span:after {
68
+ content: '\00bb';
69
+ position: absolute;
70
+ opacity: 0;
71
+ top: 0;
72
+ right: -20px;
73
+ transition: 0.5s;
74
+ }
75
+
76
+ .registerbtn:hover span {
77
+ padding-right: 25px;
78
+ }
79
+
80
+ .registerbtn:hover span:after {
81
+ opacity: 1;
82
+ right: 0;
83
+ }
app/static/main.js ADDED
@@ -0,0 +1,184 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /* Copyright 2013 Chris Wilson
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License");
4
+ you may not use this file except in compliance with the License.
5
+ You may obtain a copy of the License at
6
+
7
+ http://www.apache.org/licenses/LICENSE-2.0
8
+
9
+ Unless required by applicable law or agreed to in writing, software
10
+ distributed under the License is distributed on an "AS IS" BASIS,
11
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ See the License for the specific language governing permissions and
13
+ limitations under the License.
14
+ */
15
+
16
+ window.AudioContext = window.AudioContext || window.webkitAudioContext;
17
+
18
+ var audioContext = new AudioContext();
19
+ var audioInput = null,
20
+ realAudioInput = null,
21
+ inputPoint = null,
22
+ audioRecorder = null;
23
+ var rafID = null;
24
+ var analyserContext = null;
25
+ var canvasWidth, canvasHeight;
26
+ var recIndex = 0;
27
+
28
+
29
+ function gotBuffers(buffers) {
30
+ audioRecorder.exportMonoWAV(doneEncoding);
31
+ }
32
+
33
+ function doneEncoding(soundBlob) {
34
+ // fetch('/audio', {method: "POST", body: soundBlob}).then(response => $('#output').text(response.text()))
35
+ fetch('/audio', {method: "POST", body: soundBlob}).then(response => response.text().then(text => {
36
+ document.getElementById('output').innerHTML = text;
37
+ }));
38
+ recIndex++;
39
+ }
40
+
41
+ function stopRecording() {
42
+ // stop recording
43
+ audioRecorder.stop();
44
+ document.getElementById('stop').disabled = true;
45
+ document.getElementById('start').removeAttribute('disabled');
46
+ audioRecorder.getBuffers(gotBuffers);
47
+ }
48
+
49
+ function startRecording() {
50
+
51
+ // start recording
52
+ if (!audioRecorder)
53
+ return;
54
+ document.getElementById('start').disabled = true;
55
+ document.getElementById('stop').removeAttribute('disabled');
56
+ audioRecorder.clear();
57
+ audioRecorder.record();
58
+ }
59
+
60
+ function convertToMono(input) {
61
+ var splitter = audioContext.createChannelSplitter(2);
62
+ var merger = audioContext.createChannelMerger(2);
63
+
64
+ input.connect(splitter);
65
+ splitter.connect(merger, 0, 0);
66
+ splitter.connect(merger, 0, 1);
67
+ return merger;
68
+ }
69
+
70
+ function cancelAnalyserUpdates() {
71
+ window.cancelAnimationFrame(rafID);
72
+ rafID = null;
73
+ }
74
+
75
+ function updateAnalysers(time) {
76
+ if (!analyserContext) {
77
+ var canvas = document.getElementById("analyser");
78
+ canvasWidth = canvas.width;
79
+ canvasHeight = canvas.height;
80
+ analyserContext = canvas.getContext('2d');
81
+ }
82
+
83
+ // analyzer draw code here
84
+ {
85
+ var SPACING = 3;
86
+ var BAR_WIDTH = 1;
87
+ var numBars = Math.round(canvasWidth / SPACING);
88
+ var freqByteData = new Uint8Array(analyserNode.frequencyBinCount);
89
+
90
+ analyserNode.getByteFrequencyData(freqByteData);
91
+
92
+ analyserContext.clearRect(0, 0, canvasWidth, canvasHeight);
93
+ analyserContext.fillStyle = '#F6D565';
94
+ analyserContext.lineCap = 'round';
95
+ var multiplier = analyserNode.frequencyBinCount / numBars;
96
+
97
+ // Draw rectangle for each frequency bin.
98
+ for (var i = 0; i < numBars; ++i) {
99
+ var magnitude = 0;
100
+ var offset = Math.floor(i * multiplier);
101
+ // gotta sum/average the block, or we miss narrow-bandwidth spikes
102
+ for (var j = 0; j < multiplier; j++)
103
+ magnitude += freqByteData[offset + j];
104
+ magnitude = magnitude / multiplier;
105
+ var magnitude2 = freqByteData[i * multiplier];
106
+ analyserContext.fillStyle = "hsl( " + Math.round((i * 360) / numBars) + ", 100%, 50%)";
107
+ analyserContext.fillRect(i * SPACING, canvasHeight, BAR_WIDTH, -magnitude);
108
+ }
109
+ }
110
+
111
+ rafID = window.requestAnimationFrame(updateAnalysers);
112
+ }
113
+
114
+ function toggleMono() {
115
+ if (audioInput != realAudioInput) {
116
+ audioInput.disconnect();
117
+ realAudioInput.disconnect();
118
+ audioInput = realAudioInput;
119
+ } else {
120
+ realAudioInput.disconnect();
121
+ audioInput = convertToMono(realAudioInput);
122
+ }
123
+
124
+ audioInput.connect(inputPoint);
125
+ }
126
+
127
+ function gotStream(stream) {
128
+ document.getElementById('start').removeAttribute('disabled');
129
+
130
+ inputPoint = audioContext.createGain();
131
+
132
+ // Create an AudioNode from the stream.
133
+ realAudioInput = audioContext.createMediaStreamSource(stream);
134
+ audioInput = realAudioInput;
135
+ audioInput.connect(inputPoint);
136
+
137
+ // audioInput = convertToMono( input );
138
+
139
+ analyserNode = audioContext.createAnalyser();
140
+ analyserNode.fftSize = 2048;
141
+ inputPoint.connect(analyserNode);
142
+
143
+ audioRecorder = new Recorder(inputPoint);
144
+
145
+ zeroGain = audioContext.createGain();
146
+ zeroGain.gain.value = 0.0;
147
+ inputPoint.connect(zeroGain);
148
+ zeroGain.connect(audioContext.destination);
149
+ updateAnalysers();
150
+ }
151
+
152
+ function initAudio() {
153
+ if (!navigator.getUserMedia)
154
+ navigator.getUserMedia = navigator.webkitGetUserMedia || navigator.mozGetUserMedia;
155
+ if (!navigator.cancelAnimationFrame)
156
+ navigator.cancelAnimationFrame = navigator.webkitCancelAnimationFrame || navigator.mozCancelAnimationFrame;
157
+ if (!navigator.requestAnimationFrame)
158
+ navigator.requestAnimationFrame = navigator.webkitRequestAnimationFrame || navigator.mozRequestAnimationFrame;
159
+
160
+ navigator.getUserMedia(
161
+ {
162
+ "audio": {
163
+ "mandatory": {
164
+ "googEchoCancellation": "false",
165
+ "googAutoGainControl": "false",
166
+ "googNoiseSuppression": "false",
167
+ "googHighpassFilter": "false"
168
+ },
169
+ "optional": []
170
+ },
171
+ }, gotStream, function (e) {
172
+ alert('Error getting audio');
173
+ console.log(e);
174
+ });
175
+ }
176
+
177
+ window.addEventListener('load', initAudio);
178
+
179
+ function unpause() {
180
+ document.getElementById('init').style.display = 'none';
181
+ audioContext.resume().then(() => {
182
+ console.log('Playback resumed successfully');
183
+ });
184
+ }
app/static/recorder.js ADDED
@@ -0,0 +1,118 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /*License (MIT)
2
+
3
+ Copyright © 2013 Matt Diamond
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated
6
+ documentation files (the "Software"), to deal in the Software without restriction, including without limitation
7
+ the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and
8
+ to permit persons to whom the Software is furnished to do so, subject to the following conditions:
9
+
10
+ The above copyright notice and this permission notice shall be included in all copies or substantial portions of
11
+ the Software.
12
+
13
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO
14
+ THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
15
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF
16
+ CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
17
+ DEALINGS IN THE SOFTWARE.
18
+ */
19
+
20
+ (function(window){
21
+
22
+ var WORKER_PATH = '/static/recorderWorker.js';
23
+
24
+ var Recorder = function(source, cfg){
25
+ var config = cfg || {};
26
+ var bufferLen = config.bufferLen || 4096;
27
+ this.context = source.context;
28
+ if(!this.context.createScriptProcessor){
29
+ this.node = this.context.createJavaScriptNode(bufferLen, 2, 2);
30
+ } else {
31
+ this.node = this.context.createScriptProcessor(bufferLen, 2, 2);
32
+ }
33
+
34
+ var worker = new Worker(config.workerPath || WORKER_PATH);
35
+ worker.postMessage({
36
+ command: 'init',
37
+ config: {
38
+ sampleRate: this.context.sampleRate
39
+ }
40
+ });
41
+ var recording = false,
42
+ currCallback;
43
+
44
+ this.node.onaudioprocess = function(e){
45
+ if (!recording) return;
46
+ worker.postMessage({
47
+ command: 'record',
48
+ buffer: [
49
+ e.inputBuffer.getChannelData(0),
50
+ e.inputBuffer.getChannelData(1)
51
+ ]
52
+ });
53
+ }
54
+
55
+ this.configure = function(cfg){
56
+ for (var prop in cfg){
57
+ if (cfg.hasOwnProperty(prop)){
58
+ config[prop] = cfg[prop];
59
+ }
60
+ }
61
+ }
62
+
63
+ this.record = function(){
64
+ recording = true;
65
+ }
66
+
67
+ this.stop = function(){
68
+ recording = false;
69
+ }
70
+
71
+ this.clear = function(){
72
+ worker.postMessage({ command: 'clear' });
73
+ }
74
+
75
+ this.getBuffers = function(cb) {
76
+ currCallback = cb || config.callback;
77
+ worker.postMessage({ command: 'getBuffers' })
78
+ }
79
+
80
+ this.exportWAV = function(cb, type){
81
+ currCallback = cb || config.callback;
82
+ type = type || config.type || 'audio/wav';
83
+ if (!currCallback) throw new Error('Callback not set');
84
+ worker.postMessage({
85
+ command: 'exportWAV',
86
+ type: type
87
+ });
88
+ }
89
+
90
+ this.exportMonoWAV = function(cb, type){
91
+ currCallback = cb || config.callback;
92
+ type = type || config.type || 'audio/wav';
93
+ if (!currCallback) throw new Error('Callback not set');
94
+ worker.postMessage({
95
+ command: 'exportMonoWAV',
96
+ type: type
97
+ });
98
+ }
99
+
100
+ worker.onmessage = function(e){
101
+ var blob = e.data;
102
+ currCallback(blob);
103
+ }
104
+
105
+ source.connect(this.node);
106
+ this.node.connect(this.context.destination); // if the script node is not connected to an output the "onaudioprocess" event is not triggered in chrome.
107
+ };
108
+
109
+ Recorder.setupDownload = function(blob, filename){
110
+ var url = (window.URL || window.webkitURL).createObjectURL(blob);
111
+ var link = document.getElementById("save");
112
+ link.href = url;
113
+ link.download = filename || 'output.wav';
114
+ }
115
+
116
+ window.Recorder = Recorder;
117
+
118
+ })(window);
app/static/recorderWorker.js ADDED
@@ -0,0 +1,161 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /*License (MIT)
2
+
3
+ Copyright © 2013 Matt Diamond
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated
6
+ documentation files (the "Software"), to deal in the Software without restriction, including without limitation
7
+ the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and
8
+ to permit persons to whom the Software is furnished to do so, subject to the following conditions:
9
+
10
+ The above copyright notice and this permission notice shall be included in all copies or substantial portions of
11
+ the Software.
12
+
13
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO
14
+ THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
15
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF
16
+ CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
17
+ DEALINGS IN THE SOFTWARE.
18
+ */
19
+
20
+ var recLength = 0,
21
+ recBuffersL = [],
22
+ recBuffersR = [],
23
+ sampleRate;
24
+
25
+ this.onmessage = function(e){
26
+ switch(e.data.command){
27
+ case 'init':
28
+ init(e.data.config);
29
+ break;
30
+ case 'record':
31
+ record(e.data.buffer);
32
+ break;
33
+ case 'exportWAV':
34
+ exportWAV(e.data.type);
35
+ break;
36
+ case 'exportMonoWAV':
37
+ exportMonoWAV(e.data.type);
38
+ break;
39
+ case 'getBuffers':
40
+ getBuffers();
41
+ break;
42
+ case 'clear':
43
+ clear();
44
+ break;
45
+ }
46
+ };
47
+
48
+ function init(config){
49
+ sampleRate = config.sampleRate;
50
+ }
51
+
52
+ function record(inputBuffer){
53
+ recBuffersL.push(inputBuffer[0]);
54
+ recBuffersR.push(inputBuffer[1]);
55
+ recLength += inputBuffer[0].length;
56
+ }
57
+
58
+ function exportWAV(type){
59
+ var bufferL = mergeBuffers(recBuffersL, recLength);
60
+ var bufferR = mergeBuffers(recBuffersR, recLength);
61
+ var interleaved = interleave(bufferL, bufferR);
62
+ var dataview = encodeWAV(interleaved);
63
+ var audioBlob = new Blob([dataview], { type: type });
64
+
65
+ this.postMessage(audioBlob);
66
+ }
67
+
68
+ function exportMonoWAV(type){
69
+ var bufferL = mergeBuffers(recBuffersL, recLength);
70
+ var dataview = encodeWAV(bufferL, true);
71
+ var audioBlob = new Blob([dataview], { type: type });
72
+
73
+ this.postMessage(audioBlob);
74
+ }
75
+
76
+ function getBuffers() {
77
+ var buffers = [];
78
+ buffers.push( mergeBuffers(recBuffersL, recLength) );
79
+ buffers.push( mergeBuffers(recBuffersR, recLength) );
80
+ this.postMessage(buffers);
81
+ }
82
+
83
+ function clear(){
84
+ recLength = 0;
85
+ recBuffersL = [];
86
+ recBuffersR = [];
87
+ }
88
+
89
+ function mergeBuffers(recBuffers, recLength){
90
+ var result = new Float32Array(recLength);
91
+ var offset = 0;
92
+ for (var i = 0; i < recBuffers.length; i++){
93
+ result.set(recBuffers[i], offset);
94
+ offset += recBuffers[i].length;
95
+ }
96
+ return result;
97
+ }
98
+
99
+ function interleave(inputL, inputR){
100
+ var length = inputL.length + inputR.length;
101
+ var result = new Float32Array(length);
102
+
103
+ var index = 0,
104
+ inputIndex = 0;
105
+
106
+ while (index < length){
107
+ result[index++] = inputL[inputIndex];
108
+ result[index++] = inputR[inputIndex];
109
+ inputIndex++;
110
+ }
111
+ return result;
112
+ }
113
+
114
+ function floatTo16BitPCM(output, offset, input){
115
+ for (var i = 0; i < input.length; i++, offset+=2){
116
+ var s = Math.max(-1, Math.min(1, input[i]));
117
+ output.setInt16(offset, s < 0 ? s * 0x8000 : s * 0x7FFF, true);
118
+ }
119
+ }
120
+
121
+ function writeString(view, offset, string){
122
+ for (var i = 0; i < string.length; i++){
123
+ view.setUint8(offset + i, string.charCodeAt(i));
124
+ }
125
+ }
126
+
127
+ function encodeWAV(samples, mono){
128
+ var buffer = new ArrayBuffer(44 + samples.length * 2);
129
+ var view = new DataView(buffer);
130
+
131
+ /* RIFF identifier */
132
+ writeString(view, 0, 'RIFF');
133
+ /* file length */
134
+ view.setUint32(4, 32 + samples.length * 2, true);
135
+ /* RIFF type */
136
+ writeString(view, 8, 'WAVE');
137
+ /* format chunk identifier */
138
+ writeString(view, 12, 'fmt ');
139
+ /* format chunk length */
140
+ view.setUint32(16, 16, true);
141
+ /* sample format (raw) */
142
+ view.setUint16(20, 1, true);
143
+ /* channel count */
144
+ view.setUint16(22, mono?1:2, true);
145
+ /* sample rate */
146
+ view.setUint32(24, sampleRate, true);
147
+ /* byte rate (sample rate * block align) */
148
+ view.setUint32(28, sampleRate * 4, true);
149
+ /* block align (channel count * bytes per sample) */
150
+ view.setUint16(32, 4, true);
151
+ /* bits per sample */
152
+ view.setUint16(34, 16, true);
153
+ /* data chunk identifier */
154
+ writeString(view, 36, 'data');
155
+ /* data chunk length */
156
+ view.setUint32(40, samples.length * 2, true);
157
+
158
+ floatTo16BitPCM(view, 44, samples);
159
+
160
+ return view;
161
+ }
app/static/text_to_speech.png ADDED
app/templates/audio_to_text.html ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html>
3
+ <head>
4
+ <meta name="viewport" content="width=device-width, initial-scale=1">
5
+
6
+ <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
7
+
8
+ <link rel="stylesheet" type="text/css" href="/static/header_file.css">
9
+ <link rel="stylesheet" type="text/css" href="/static/footer_file.css">
10
+ <link rel="stylesheet" type="text/css" href="/static/main.css">
11
+ <link rel="stylesheet" type="text/css" href="/static/audio_to_text.css">
12
+ <script src="/static/recorder.js"></script>
13
+ <script src="/static/audiodisplay.js"></script>
14
+ <script src="/static/main.js"></script>
15
+ <script src="/static/recorderWorker.js"></script>
16
+
17
+ </head>
18
+ <body style="background-color:rgb(127, 195, 255);">
19
+ {% include "header.html" %}
20
+
21
+ <div id="loader">
22
+ <p class="glow">SPEECH to TEXT</p>
23
+ </div>
24
+
25
+ <div id="loader">
26
+ <div class="row " style="position:relative;">
27
+ <div class="column1">
28
+ <h2>Record Audio</h2>
29
+ <p>
30
+ <button id="start" class="btn btn-success" onclick="startRecording()" disabled>Start</button>
31
+ <button id="stop" class="btn btn-danger" onclick="stopRecording()" disabled>Stop</button>
32
+ </p>
33
+ </div>
34
+ <div class="column2" >
35
+ <h2>Audio Prediction</h2>
36
+ <p class="glow" id="output" style="font-size: 20px;"></p>
37
+
38
+
39
+ </div>
40
+ </div>
41
+ </div>
42
+ {% extends "footer.html" %}
43
+ {% block footer%}fixed{% endblock %}
44
+ </body>
45
+
46
+ </html>
app/templates/footer.html ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ <div class="footnov" id="myFootnav" style="position: {% block footer%}{% endblock %}; bottom: 0; width: 100%;">
3
+ <a href="" >Follow me : </a>
4
+
5
+ <a href="https://www.facebook.com/vatsal.parasaniya" style="background: #3B5998; color: white;" class="fa fa-facebook"></a>
6
+ <a href="https://github.com/Vatsalparsaniya" style="background: black; color: white;" class="fa fa-github"></a>
7
+ <a href="https://twitter.com/VatsalParsaniya" style="background: #55ACEE; color: white;" class="fa fa-twitter"></a>
8
+ <a href="https://www.linkedin.com/in/vatsal-parsaniya/" style="background: #007bb5; color: white;" class="fa fa-linkedin"></a>
9
+ </div>
10
+
11
+
app/templates/header.html ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ <div class="topnav" id="myTopnav">
3
+ <a href="/" class="active">Home</a>
4
+ <a href="/audio_to_text/">Speech to Text</a>
5
+ <a href="javascript:void(0);" class="icon" onclick="myFunction()">
6
+ <i class="fa fa-bars"></i>
7
+ </a>
8
+ </div>
9
+
10
+
11
+
12
+ {% with messages = get_flashed_messages(with_categories=true) %}
13
+ {% if messages %}
14
+ {% for category,message in messages %}
15
+
16
+ <div class="alert">
17
+ <span class="closebtn" onclick="this.parentElement.style.display='none';">&times;</span>
18
+ <strong>{{category}} :</strong> {{ message }}
19
+ </div>
20
+ {% endfor %}
21
+ {% endif %}
22
+ {% endwith %}
app/templates/index.html ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html>
3
+ <head>
4
+ <meta name="viewport" content="width=device-width, initial-scale=1">
5
+ <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
6
+ <link rel="stylesheet" type="text/css" href="/static/header_file.css">
7
+ <link rel="stylesheet" type="text/css" href="/static/footer_file.css">
8
+ <link rel="stylesheet" type="text/css" href="/static/main.css">
9
+ <link href="https://unpkg.com/tailwindcss@^1.0/dist/tailwind.min.css" rel="stylesheet">
10
+ </head>
11
+ <body style="background-color:rgb(127, 195, 255);">
12
+ {% include "header.html" %}
13
+ <section class="text-gray-700 body-font">
14
+
15
+ <div class="container mx-auto flex px-5 py-24 md:flex-row flex-col items-center">
16
+ <div class="lg:max-w-lg lg:w-full md:w-1/2 w-5/6" style="padding-right: 10px;">
17
+ <img class="object-cover object-center rounded" alt="hero" src="static/text_to_speech.png">
18
+ </div>
19
+ <div class="lg:flex-grow md:w-1/2 lg:pr-24 md:pr-16 flex flex-col md:items-start md:text-left mb-16 md:mb-0 items-center text-center">
20
+ <h1 class="title-font sm:text-4xl text-3xl mb-4 font-medium text-gray-900">Speech To Text Conversion</h1>
21
+ <p class="title-font sm:text-2xl text-1xl mb-4 font-medium text-gray-900">(Flask + Heroku + SpeechRecognition + Recorderjs)</p>
22
+
23
+ <p class="mb-8 leading-relaxed"><b>Flask</b> is a micro web framework written in Python.</p>
24
+ <p class="mb-8 leading-relaxed"><b>SpeechRecognition</b> Library for performing speech recognition, with support for several engines and APIs, online and offline.</p>
25
+ <p class="mb-8 leading-relaxed"><b>Heroku</b> is a platform as a service (PaaS) that enables developers to build, run, and operate applications entirely in the cloud.</p>
26
+ <p class="mb-8 leading-relaxed"><b>Recorderjs</b> A plugin for recording/exporting the output of Web Audio API nodes </p>
27
+ <div class="flex justify-center">
28
+ <form action="/audio_to_text/"><button class="inline-flex text-white bg-indigo-500 border-0 py-2 px-6 focus:outline-none hover:bg-indigo-600 rounded text-lg">Testing Here</button></form>
29
+ </div>
30
+ </div>
31
+
32
+ </div>
33
+ </section>
34
+
35
+ </body>
36
+ {% extends "footer.html" %}
37
+ {% block footer%}fixed{% endblock %}
38
+ </html>
app/tmp/audio.wav ADDED
Binary file (164 kB). View file
 
app/tmp/test.mp3 ADDED
Binary file (166 kB). View file
 
app/wsgi.py ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ from app import app
2
+
3
+ if __name__ == '__main__':
4
+ app.run()
requirements.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ transformers
2
+ accelerate
3
+ torch
4
+ whisper
5
+ Flask
6
+ Flask-Cors
7
+ faster-whisper