import streamlit as st
# Custom CSS for better styling
st.markdown("""
""", unsafe_allow_html=True)
# Main Title
st.markdown('
Wav2Vec2 for Speech Recognition
', unsafe_allow_html=True)
# Description
st.markdown("""
Wav2Vec2 is a groundbreaking model in Automatic Speech Recognition (ASR), developed to learn speech representations from raw audio. This model achieves exceptional accuracy with minimal labeled data, making it ideal for low-resource settings. Adapted for Spark NLP, Wav2Vec2 enables scalable, production-ready ASR applications.
""", unsafe_allow_html=True)
# Why, Where, and When to Use Wav2Vec2
st.markdown('Why, Where, and When to Use Wav2Vec2
', unsafe_allow_html=True)
st.markdown("""
Use Wav2Vec2 when you need a robust ASR solution that excels in scenarios with limited labeled data. It’s perfect for various speech-to-text applications where scalability and accuracy are critical. Some ideal use cases include:
- Transcription Services: Efficiently convert large volumes of speech into text, vital for media, legal, and healthcare industries.
- Voice-Activated Assistants: Enhance the accuracy of voice commands in smart devices and personal assistants.
- Meeting Summarization: Automatically transcribe and summarize meetings, aiding in easy content review and catch-up for absentees.
- Language Learning Tools: Assist learners in improving pronunciation by providing real-time speech-to-text feedback.
- Accessibility Enhancements: Generate real-time captions for videos and live events, making content accessible to the hearing impaired.
- Call Center Analytics: Analyze customer interactions for insights and quality monitoring.
""", unsafe_allow_html=True)
# How to Use the Model
st.markdown('How to Use the Model
', unsafe_allow_html=True)
st.code('''
audio_assembler = AudioAssembler() \\
.setInputCol("audio_content") \\
.setOutputCol("audio_assembler")
speech_to_text = Wav2Vec2ForCTC \\
.pretrained("asr_wav2vec2_large_xlsr_53_english_by_jonatasgrosman", "en")\\
.setInputCols("audio_assembler") \\
.setOutputCol("text")
pipeline = Pipeline(stages=[
audio_assembler,
speech_to_text,
])
pipelineModel = pipeline.fit(audioDf)
pipelineDF = pipelineModel.transform(audioDf)
''', language='python')
# Best Practices & Tips
st.markdown('Best Practices & Tips
', unsafe_allow_html=True)
st.markdown("""
- Preprocessing: Ensure your audio data is clear and well-prepared by removing background noise and normalizing audio levels for the best transcription results.
- Fine-tuning: For specific use cases or languages, consider fine-tuning the model on your own dataset to improve accuracy.
- Batch Processing: Leverage Spark NLP's distributed processing capabilities to handle large-scale audio datasets efficiently.
- Model Evaluation: Regularly evaluate the model's performance on your specific use case using metrics like Word Error Rate (WER) to ensure it meets your accuracy requirements.
- Resource Management: When deploying in production, monitor resource usage, especially for large models, to optimize performance and cost.
""", unsafe_allow_html=True)
# Model Information
st.markdown('Model Information
', unsafe_allow_html=True)
st.markdown("""
Attribute |
Description |
Model Name |
asr_wav2vec2_large_xlsr_53_english_by_jonatasgrosman |
Compatibility |
Spark NLP 4.2.0+ |
License |
Open Source |
Edition |
Official |
Input Labels |
[audio_assembler] |
Output Labels |
[text] |
Language |
en |
Size |
1.2 GB |
""", unsafe_allow_html=True)
# Data Source Section
st.markdown('Data Source
', unsafe_allow_html=True)
st.markdown("""
The Wav2Vec2 model is available on Hugging Face. This model, trained by jonatasgrosman, has been adapted for use with Spark NLP, ensuring it is optimized for large-scale applications.
""", unsafe_allow_html=True)
# Conclusion
st.markdown('Conclusion
', unsafe_allow_html=True)
st.markdown("""
Wav2Vec2 is a versatile and powerful ASR model that excels in scenarios with limited labeled data, making it a game-changer in the field of speech recognition. Its seamless integration with Spark NLP allows for scalable, efficient, and accurate deployment in various real-world applications, from transcription services to voice-activated systems.
""", unsafe_allow_html=True)
# References
st.markdown('References
', unsafe_allow_html=True)
st.markdown("""
""", unsafe_allow_html=True)
# Community & Support
st.markdown('Community & Support
', unsafe_allow_html=True)
st.markdown("""
- Official Website: Comprehensive documentation and examples.
- Slack: Join the community for live discussions and support.
- GitHub: Report issues, request features, and contribute to the project.
- Medium: Read articles and tutorials about Spark NLP.
- YouTube: Watch video tutorials and demonstrations.
""", unsafe_allow_html=True)