File size: 9,140 Bytes
6b52778
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
import streamlit as st

# Custom CSS for better styling
st.markdown("""

    <style>

        .main-title {

            font-size: 36px;

            color: #4A90E2;

            font-weight: bold;

            text-align: center;

        }

        .sub-title {

            font-size: 24px;

            color: #4A90E2;

            margin-top: 20px;

        }

        .section {

            background-color: #f9f9f9;

            padding: 15px;

            border-radius: 10px;

            margin-top: 20px;

        }

        .section p, .section ul {

            color: #666666;

        }

        .link {

            color: #4A90E2;

            text-decoration: none;

        }

        .benchmark-table {

            width: 100%;

            border-collapse: collapse;

            margin-top: 20px;

        }

        .benchmark-table th, .benchmark-table td {

            border: 1px solid #ddd;

            padding: 8px;

            text-align: left;

        }

        .benchmark-table th {

            background-color: #4A90E2;

            color: white;

        }

        .benchmark-table td {

            background-color: #f2f2f2;

        }

    </style>

""", unsafe_allow_html=True)

# Main Title
st.markdown('<div class="main-title">Wav2Vec2 for Speech Recognition</div>', unsafe_allow_html=True)

# Description
st.markdown("""

<div class="section">

    <p><strong>Wav2Vec2</strong> is a groundbreaking model in Automatic Speech Recognition (ASR), developed to learn speech representations from raw audio. This model achieves exceptional accuracy with minimal labeled data, making it ideal for low-resource settings. Adapted for Spark NLP, Wav2Vec2 enables scalable, production-ready ASR applications.</p>

</div>

""", unsafe_allow_html=True)

# Why, Where, and When to Use Wav2Vec2
st.markdown('<div class="sub-title">Why, Where, and When to Use Wav2Vec2</div>', unsafe_allow_html=True)
st.markdown("""

<div class="section">

    <p>Use <strong>Wav2Vec2</strong> when you need a robust ASR solution that excels in scenarios with limited labeled data. It’s perfect for various speech-to-text applications where scalability and accuracy are critical. Some ideal use cases include:</p>

    <ul>

        <li><strong>Transcription Services:</strong> Efficiently convert large volumes of speech into text, vital for media, legal, and healthcare industries.</li>

        <li><strong>Voice-Activated Assistants:</strong> Enhance the accuracy of voice commands in smart devices and personal assistants.</li>

        <li><strong>Meeting Summarization:</strong> Automatically transcribe and summarize meetings, aiding in easy content review and catch-up for absentees.</li>

        <li><strong>Language Learning Tools:</strong> Assist learners in improving pronunciation by providing real-time speech-to-text feedback.</li>

        <li><strong>Accessibility Enhancements:</strong> Generate real-time captions for videos and live events, making content accessible to the hearing impaired.</li>

        <li><strong>Call Center Analytics:</strong> Analyze customer interactions for insights and quality monitoring.</li>

    </ul>

</div>

""", unsafe_allow_html=True)

# How to Use the Model
st.markdown('<div class="sub-title">How to Use the Model</div>', unsafe_allow_html=True)
st.code('''

audio_assembler = AudioAssembler() \\

    .setInputCol("audio_content") \\

    .setOutputCol("audio_assembler")



speech_to_text = Wav2Vec2ForCTC \\

    .pretrained("asr_wav2vec2_large_xlsr_53_english_by_jonatasgrosman", "en")\\

    .setInputCols("audio_assembler") \\

    .setOutputCol("text")



pipeline = Pipeline(stages=[

  audio_assembler,

  speech_to_text,

])



pipelineModel = pipeline.fit(audioDf)



pipelineDF = pipelineModel.transform(audioDf)

''', language='python')

# Best Practices & Tips
st.markdown('<div class="sub-title">Best Practices & Tips</div>', unsafe_allow_html=True)
st.markdown("""

<div class="section">

    <ul>

        <li><strong>Preprocessing:</strong> Ensure your audio data is clear and well-prepared by removing background noise and normalizing audio levels for the best transcription results.</li>

        <li><strong>Fine-tuning:</strong> For specific use cases or languages, consider fine-tuning the model on your own dataset to improve accuracy.</li>

        <li><strong>Batch Processing:</strong> Leverage Spark NLP's distributed processing capabilities to handle large-scale audio datasets efficiently.</li>

        <li><strong>Model Evaluation:</strong> Regularly evaluate the model's performance on your specific use case using metrics like Word Error Rate (WER) to ensure it meets your accuracy requirements.</li>

        <li><strong>Resource Management:</strong> When deploying in production, monitor resource usage, especially for large models, to optimize performance and cost.</li>

    </ul>

</div>

""", unsafe_allow_html=True)

# Model Information
st.markdown('<div class="sub-title">Model Information</div>', unsafe_allow_html=True)
st.markdown("""

<div class="section">

    <table class="benchmark-table">

        <tr>

            <th>Attribute</th>

            <th>Description</th>

        </tr>

        <tr>

            <td><strong>Model Name</strong></td>

            <td>asr_wav2vec2_large_xlsr_53_english_by_jonatasgrosman</td>

        </tr>

        <tr>

            <td><strong>Compatibility</strong></td>

            <td>Spark NLP 4.2.0+</td>

        </tr>

        <tr>

            <td><strong>License</strong></td>

            <td>Open Source</td>

        </tr>

        <tr>

            <td><strong>Edition</strong></td>

            <td>Official</td>

        </tr>

        <tr>

            <td><strong>Input Labels</strong></td>

            <td>[audio_assembler]</td>

        </tr>

        <tr>

            <td><strong>Output Labels</strong></td>

            <td>[text]</td>

        </tr>

        <tr>

            <td><strong>Language</strong></td>

            <td>en</td>

        </tr>

        <tr>

            <td><strong>Size</strong></td>

            <td>1.2 GB</td>

        </tr>

    </table>

</div>

""", unsafe_allow_html=True)

# Data Source Section
st.markdown('<div class="sub-title">Data Source</div>', unsafe_allow_html=True)
st.markdown("""

<div class="section">

    <p>The Wav2Vec2 model is available on <a class="link" href="https://huggingface.co/jonatasgrosman/asr_wav2vec2_large_xlsr_53_english" target="_blank">Hugging Face</a>. This model, trained by <em>jonatasgrosman</em>, has been adapted for use with Spark NLP, ensuring it is optimized for large-scale applications.</p>

</div>

""", unsafe_allow_html=True)

# Conclusion
st.markdown('<div class="sub-title">Conclusion</div>', unsafe_allow_html=True)
st.markdown("""

<div class="section">

    <p><strong>Wav2Vec2</strong> is a versatile and powerful ASR model that excels in scenarios with limited labeled data, making it a game-changer in the field of speech recognition. Its seamless integration with Spark NLP allows for scalable, efficient, and accurate deployment in various real-world applications, from transcription services to voice-activated systems.</p>

</div>

""", unsafe_allow_html=True)

# References
st.markdown('<div class="sub-title">References</div>', unsafe_allow_html=True)
st.markdown("""

<div class="section">

    <ul>

        <li><a class="link" href="https://sparknlp.org/2022/09/24/asr_wav2vec2_large_xlsr_53_english_by_jonatasgrosman_en.html" target="_blank">Wav2Vec2 Model on Spark NLP</a></li>

        <li><a class="link" href="https://huggingface.co/jonatasgrosman/asr_wav2vec2_large_xlsr_53_english" target="_blank">Wav2Vec2 Model on Hugging Face</a></li>

        <li><a class="link" href="https://arxiv.org/abs/2006.11477" target="_blank">wav2vec 2.0 Paper</a></li>

        <li><a class="link" href="https://github.com/pytorch/fairseq/tree/master/examples/wav2vec" target="_blank">Wav2Vec2 GitHub Repository</a></li>

    </ul>

</div>

""", unsafe_allow_html=True)

# Community & Support
st.markdown('<div class="sub-title">Community & Support</div>', unsafe_allow_html=True)
st.markdown("""

<div class="section">

    <ul>

        <li><a class="link" href="https://sparknlp.org/" target="_blank">Official Website</a>: Comprehensive documentation and examples.</li>

        <li><a class="link" href="https://join.slack.com/t/spark-nlp/shared_invite/zt-198dipu77-L3UWNe_AJ8xqDk0ivmih5Q" target="_blank">Slack</a>: Join the community for live discussions and support.</li>

        <li><a class="link" href="https://github.com/JohnSnowLabs/spark-nlp" target="_blank">GitHub</a>: Report issues, request features, and contribute to the project.</li>

        <li><a class="link" href="https://medium.com/spark-nlp" target="_blank">Medium</a>: Read articles and tutorials about Spark NLP.</li>

        <li><a class="link" href="https://www.youtube.com/channel/UCmFOjlpYEhxf_wJUDuz6xxQ/videos" target="_blank">YouTube</a>: Watch video tutorials and demonstrations.</li>

    </ul>

</div>

""", unsafe_allow_html=True)