Spaces:
Sleeping
Sleeping
File size: 9,140 Bytes
6b52778 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 |
import streamlit as st
# Custom CSS for better styling
st.markdown("""
<style>
.main-title {
font-size: 36px;
color: #4A90E2;
font-weight: bold;
text-align: center;
}
.sub-title {
font-size: 24px;
color: #4A90E2;
margin-top: 20px;
}
.section {
background-color: #f9f9f9;
padding: 15px;
border-radius: 10px;
margin-top: 20px;
}
.section p, .section ul {
color: #666666;
}
.link {
color: #4A90E2;
text-decoration: none;
}
.benchmark-table {
width: 100%;
border-collapse: collapse;
margin-top: 20px;
}
.benchmark-table th, .benchmark-table td {
border: 1px solid #ddd;
padding: 8px;
text-align: left;
}
.benchmark-table th {
background-color: #4A90E2;
color: white;
}
.benchmark-table td {
background-color: #f2f2f2;
}
</style>
""", unsafe_allow_html=True)
# Main Title
st.markdown('<div class="main-title">Wav2Vec2 for Speech Recognition</div>', unsafe_allow_html=True)
# Description
st.markdown("""
<div class="section">
<p><strong>Wav2Vec2</strong> is a groundbreaking model in Automatic Speech Recognition (ASR), developed to learn speech representations from raw audio. This model achieves exceptional accuracy with minimal labeled data, making it ideal for low-resource settings. Adapted for Spark NLP, Wav2Vec2 enables scalable, production-ready ASR applications.</p>
</div>
""", unsafe_allow_html=True)
# Why, Where, and When to Use Wav2Vec2
st.markdown('<div class="sub-title">Why, Where, and When to Use Wav2Vec2</div>', unsafe_allow_html=True)
st.markdown("""
<div class="section">
<p>Use <strong>Wav2Vec2</strong> when you need a robust ASR solution that excels in scenarios with limited labeled data. Itβs perfect for various speech-to-text applications where scalability and accuracy are critical. Some ideal use cases include:</p>
<ul>
<li><strong>Transcription Services:</strong> Efficiently convert large volumes of speech into text, vital for media, legal, and healthcare industries.</li>
<li><strong>Voice-Activated Assistants:</strong> Enhance the accuracy of voice commands in smart devices and personal assistants.</li>
<li><strong>Meeting Summarization:</strong> Automatically transcribe and summarize meetings, aiding in easy content review and catch-up for absentees.</li>
<li><strong>Language Learning Tools:</strong> Assist learners in improving pronunciation by providing real-time speech-to-text feedback.</li>
<li><strong>Accessibility Enhancements:</strong> Generate real-time captions for videos and live events, making content accessible to the hearing impaired.</li>
<li><strong>Call Center Analytics:</strong> Analyze customer interactions for insights and quality monitoring.</li>
</ul>
</div>
""", unsafe_allow_html=True)
# How to Use the Model
st.markdown('<div class="sub-title">How to Use the Model</div>', unsafe_allow_html=True)
st.code('''
audio_assembler = AudioAssembler() \\
.setInputCol("audio_content") \\
.setOutputCol("audio_assembler")
speech_to_text = Wav2Vec2ForCTC \\
.pretrained("asr_wav2vec2_large_xlsr_53_english_by_jonatasgrosman", "en")\\
.setInputCols("audio_assembler") \\
.setOutputCol("text")
pipeline = Pipeline(stages=[
audio_assembler,
speech_to_text,
])
pipelineModel = pipeline.fit(audioDf)
pipelineDF = pipelineModel.transform(audioDf)
''', language='python')
# Best Practices & Tips
st.markdown('<div class="sub-title">Best Practices & Tips</div>', unsafe_allow_html=True)
st.markdown("""
<div class="section">
<ul>
<li><strong>Preprocessing:</strong> Ensure your audio data is clear and well-prepared by removing background noise and normalizing audio levels for the best transcription results.</li>
<li><strong>Fine-tuning:</strong> For specific use cases or languages, consider fine-tuning the model on your own dataset to improve accuracy.</li>
<li><strong>Batch Processing:</strong> Leverage Spark NLP's distributed processing capabilities to handle large-scale audio datasets efficiently.</li>
<li><strong>Model Evaluation:</strong> Regularly evaluate the model's performance on your specific use case using metrics like Word Error Rate (WER) to ensure it meets your accuracy requirements.</li>
<li><strong>Resource Management:</strong> When deploying in production, monitor resource usage, especially for large models, to optimize performance and cost.</li>
</ul>
</div>
""", unsafe_allow_html=True)
# Model Information
st.markdown('<div class="sub-title">Model Information</div>', unsafe_allow_html=True)
st.markdown("""
<div class="section">
<table class="benchmark-table">
<tr>
<th>Attribute</th>
<th>Description</th>
</tr>
<tr>
<td><strong>Model Name</strong></td>
<td>asr_wav2vec2_large_xlsr_53_english_by_jonatasgrosman</td>
</tr>
<tr>
<td><strong>Compatibility</strong></td>
<td>Spark NLP 4.2.0+</td>
</tr>
<tr>
<td><strong>License</strong></td>
<td>Open Source</td>
</tr>
<tr>
<td><strong>Edition</strong></td>
<td>Official</td>
</tr>
<tr>
<td><strong>Input Labels</strong></td>
<td>[audio_assembler]</td>
</tr>
<tr>
<td><strong>Output Labels</strong></td>
<td>[text]</td>
</tr>
<tr>
<td><strong>Language</strong></td>
<td>en</td>
</tr>
<tr>
<td><strong>Size</strong></td>
<td>1.2 GB</td>
</tr>
</table>
</div>
""", unsafe_allow_html=True)
# Data Source Section
st.markdown('<div class="sub-title">Data Source</div>', unsafe_allow_html=True)
st.markdown("""
<div class="section">
<p>The Wav2Vec2 model is available on <a class="link" href="https://huggingface.co/jonatasgrosman/asr_wav2vec2_large_xlsr_53_english" target="_blank">Hugging Face</a>. This model, trained by <em>jonatasgrosman</em>, has been adapted for use with Spark NLP, ensuring it is optimized for large-scale applications.</p>
</div>
""", unsafe_allow_html=True)
# Conclusion
st.markdown('<div class="sub-title">Conclusion</div>', unsafe_allow_html=True)
st.markdown("""
<div class="section">
<p><strong>Wav2Vec2</strong> is a versatile and powerful ASR model that excels in scenarios with limited labeled data, making it a game-changer in the field of speech recognition. Its seamless integration with Spark NLP allows for scalable, efficient, and accurate deployment in various real-world applications, from transcription services to voice-activated systems.</p>
</div>
""", unsafe_allow_html=True)
# References
st.markdown('<div class="sub-title">References</div>', unsafe_allow_html=True)
st.markdown("""
<div class="section">
<ul>
<li><a class="link" href="https://sparknlp.org/2022/09/24/asr_wav2vec2_large_xlsr_53_english_by_jonatasgrosman_en.html" target="_blank">Wav2Vec2 Model on Spark NLP</a></li>
<li><a class="link" href="https://huggingface.co/jonatasgrosman/asr_wav2vec2_large_xlsr_53_english" target="_blank">Wav2Vec2 Model on Hugging Face</a></li>
<li><a class="link" href="https://arxiv.org/abs/2006.11477" target="_blank">wav2vec 2.0 Paper</a></li>
<li><a class="link" href="https://github.com/pytorch/fairseq/tree/master/examples/wav2vec" target="_blank">Wav2Vec2 GitHub Repository</a></li>
</ul>
</div>
""", unsafe_allow_html=True)
# Community & Support
st.markdown('<div class="sub-title">Community & Support</div>', unsafe_allow_html=True)
st.markdown("""
<div class="section">
<ul>
<li><a class="link" href="https://sparknlp.org/" target="_blank">Official Website</a>: Comprehensive documentation and examples.</li>
<li><a class="link" href="https://join.slack.com/t/spark-nlp/shared_invite/zt-198dipu77-L3UWNe_AJ8xqDk0ivmih5Q" target="_blank">Slack</a>: Join the community for live discussions and support.</li>
<li><a class="link" href="https://github.com/JohnSnowLabs/spark-nlp" target="_blank">GitHub</a>: Report issues, request features, and contribute to the project.</li>
<li><a class="link" href="https://medium.com/spark-nlp" target="_blank">Medium</a>: Read articles and tutorials about Spark NLP.</li>
<li><a class="link" href="https://www.youtube.com/channel/UCmFOjlpYEhxf_wJUDuz6xxQ/videos" target="_blank">YouTube</a>: Watch video tutorials and demonstrations.</li>
</ul>
</div>
""", unsafe_allow_html=True)
|