Spaces:
Sleeping
Sleeping
File size: 8,599 Bytes
a29ca40 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 |
import streamlit as st
# Custom CSS for better styling
st.markdown("""
<style>
.main-title {
font-size: 36px;
color: #4A90E2;
font-weight: bold;
text-align: center;
}
.sub-title {
font-size: 24px;
color: #4A90E2;
margin-top: 20px;
}
.section {
background-color: #f9f9f9;
padding: 15px;
border-radius: 10px;
margin-top: 20px;
}
.section h2 {
font-size: 22px;
color: #4A90E2;
}
.section p, .section ul {
color: #666666;
}
.link {
color: #4A90E2;
text-decoration: none;
}
.benchmark-table {
width: 100%;
border-collapse: collapse;
margin-top: 20px;
}
.benchmark-table th, .benchmark-table td {
border: 1px solid #ddd;
padding: 8px;
text-align: left;
}
.benchmark-table th {
background-color: #4A90E2;
color: white;
}
.benchmark-table td {
background-color: #f2f2f2;
}
</style>
""", unsafe_allow_html=True)
# Main Title
st.markdown('<div class="main-title">HuBERT for Speech Recognition</div>', unsafe_allow_html=True)
# Introduction
st.markdown("""
<div class="section">
<p><strong>HuBERT</strong> (Hidden-Unit BERT) is a self-supervised speech representation model introduced in the paper <em>HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units</em> by Wei-Ning Hsu et al. It tackles challenges in speech representation by predicting hidden units derived from clustered speech features, enabling the model to learn acoustic and language representations from unsegmented and unannotated audio data.</p>
</div>
""", unsafe_allow_html=True)
# Why, Where, and When to Use HuBERT
st.markdown('<div class="sub-title">Why, Where, and When to Use HuBERT</div>', unsafe_allow_html=True)
# Explanation Section
st.markdown("""
<div class="section">
<p><strong>HuBERT</strong> is particularly useful in scenarios where high-quality speech-to-text conversion is required and where there is a need for robust speech representation learning. The model’s design makes it suitable for tasks where data may be noisy or unannotated. Key use cases include:</p>
</div>
""", unsafe_allow_html=True)
# Use Cases Section
st.markdown('<div class="sub-title">Use Cases</div>', unsafe_allow_html=True)
st.markdown("""
<div class="section">
<ul>
<li><strong>Noisy Environment Transcription:</strong> Ideal for transcribing speech in noisy or challenging audio environments, such as call centers or field recordings.</li>
<li><strong>Preprocessing for NLP Tasks:</strong> Converts spoken language into text for NLP tasks like sentiment analysis, topic modeling, or entity recognition.</li>
<li><strong>Audio Content Analysis:</strong> Efficiently analyzes large volumes of audio content, enabling keyword extraction and content summarization.</li>
<li><strong>Language Model Enhancement:</strong> Enhances language models by providing robust speech representations, improving accuracy in tasks like machine translation or voice-activated systems.</li>
</ul>
</div>
""", unsafe_allow_html=True)
# How to Use the Model
st.markdown('<div class="sub-title">HuBERT Pipeline in Spark NLP</div>', unsafe_allow_html=True)
st.markdown("""
<div class="section">
<p>To use the HuBERT model in Spark NLP, follow the example code below. This code demonstrates how to assemble audio data and apply the HubertForCTC annotator to convert speech to text.</p>
</div>
""", unsafe_allow_html=True)
st.code('''
audio_assembler = AudioAssembler()\\
.setInputCol("audio_content")\\
.setOutputCol("audio_assembler")
speech_to_text = HubertForCTC.pretrained("asr_hubert_large_ls960", "en")\\
.setInputCols("audio_assembler")\\
.setOutputCol("text")
pipeline = Pipeline(stages=[
audio_assembler,
speech_to_text,
])
pipelineModel = pipeline.fit(audioDf)
pipelineDF = pipelineModel.transform(audioDf)
''', language='python')
# Model Information
st.markdown('<div class="sub-title">Model Information</div>', unsafe_allow_html=True)
st.markdown("""
<div class="section">
<table class="benchmark-table">
<tr>
<th>Attribute</th>
<th>Description</th>
</tr>
<tr>
<td><strong>Model Name</strong></td>
<td>asr_hubert_large_ls960</td>
</tr>
<tr>
<td><strong>Compatibility</strong></td>
<td>Spark NLP 4.3.0+</td>
</tr>
<tr>
<td><strong>License</strong></td>
<td>Open Source</td>
</tr>
<tr>
<td><strong>Edition</strong></td>
<td>Official</td>
</tr>
<tr>
<td><strong>Input Labels</strong></td>
<td>[audio_assembler]</td>
</tr>
<tr>
<td><strong>Output Labels</strong></td>
<td>[text]</td>
</tr>
<tr>
<td><strong>Language</strong></td>
<td>en</td>
</tr>
<tr>
<td><strong>Size</strong></td>
<td>1.5 GB</td>
</tr>
</table>
</div>
""", unsafe_allow_html=True)
# Data Source Section
st.markdown('<div class="sub-title">Data Source</div>', unsafe_allow_html=True)
st.markdown("""
<div class="section">
<p>The HuBERT model is available on <a class="link" href="https://huggingface.co/facebook/hubert-large-ls960-ft" target="_blank">Hugging Face</a>. It was fine-tuned on 960 hours of Librispeech data and is optimized for 16kHz sampled speech audio. Ensure your input audio is sampled at the same rate for optimal performance.</p>
</div>
""", unsafe_allow_html=True)
# Conclusion
st.markdown('<div class="sub-title">Conclusion</div>', unsafe_allow_html=True)
st.markdown("""
<div class="section">
<p><strong>HuBERT</strong> offers a powerful solution for self-supervised speech recognition, especially in challenging audio environments. Its ability to learn from unannotated data and predict masked speech units makes it a robust model for various speech-related tasks. Integrated into Spark NLP, HuBERT is ready for large-scale deployment, supporting a wide range of applications from transcription to feature extraction.</p>
<p>If you’re working on speech recognition projects that require resilience to noise and variability, HuBERT provides an advanced, scalable option.</p>
</div>
""", unsafe_allow_html=True)
# References
st.markdown('<div class="sub-title">References</div>', unsafe_allow_html=True)
st.markdown("""
<div class="section">
<ul>
<li><a class="link" href="https://sparknlp.org/2023/02/07/asr_hubert_large_ls960_en.html" target="_blank">HuBERT Model on Sparknlp</a></li>
<li><a class="link" href="https://huggingface.co/facebook/hubert-large-ls960-ft" target="_blank">HuBERT Model on Hugging Face</a></li>
<li><a class="link" href="https://github.com/pytorch/fairseq/tree/master/examples/hubert" target="_blank">HuBERT GitHub Repository</a></li>
<li><a class="link" href="https://arxiv.org/abs/2106.07447" target="_blank">HuBERT Paper on arXiv</a></li>
</ul>
</div>
""", unsafe_allow_html=True)
# Community & Support
st.markdown('<div class="sub-title">Community & Support</div>', unsafe_allow_html=True)
st.markdown("""
<div class="section">
<ul>
<li><a class="link" href="https://sparknlp.org/" target="_blank">Official Website</a>: Documentation and examples</li>
<li><a class="link" href="https://join.slack.com/t/spark-nlp/shared_invite/zt-198dipu77-L3UWNe_AJ8xqDk0ivmih5Q" target="_blank">Slack</a>: Live discussion with the community and team</li>
<li><a class="link" href="https://github.com/JohnSnowLabs/spark-nlp" target="_blank">GitHub</a>: Bug reports, feature requests, and contributions</li>
<li><a class="link" href="https://medium.com/spark-nlp" target="_blank">Medium</a>: Spark NLP articles</li>
<li><a class="link" href="https://www.youtube.com/channel/UCmFOjlpYEhxf_wJUDuz6xxQ/videos" target="_blank">YouTube</a>: Video tutorials</li>
</ul>
</div>
""", unsafe_allow_html=True)
|