import streamlit as st
# Page configuration
st.set_page_config(
layout="wide",
initial_sidebar_state="auto"
)
# Custom CSS for better styling
st.markdown("""
""", unsafe_allow_html=True)
# Title
st.markdown('
Introduction to XLM-RoBERTa Annotators in Spark NLP
', unsafe_allow_html=True)
# Subtitle
st.markdown("""
XLM-RoBERTa (Cross-lingual Robustly Optimized BERT Approach) is an advanced multilingual model that extends the capabilities of RoBERTa to over 100 languages. Pre-trained on a massive, diverse corpus, XLM-RoBERTa is designed to handle various NLP tasks in a multilingual context, making it ideal for applications that require cross-lingual understanding. Below, we provide an overview of the XLM-RoBERTa annotators for these tasks:
""", unsafe_allow_html=True)
# XLM-RoBERTa for Question Answering
st.markdown("""Question Answering with XLM-RoBERTa
""", unsafe_allow_html=True)
st.markdown("""
Question answering (QA) is a crucial task in Natural Language Processing (NLP) where the goal is to extract an answer from a given context in response to a specific question.
XLM-RoBERTa excels in question answering tasks across multiple languages, making it an invaluable tool for global applications. Below is an example of how to implement question answering using XLM-RoBERTa in Spark NLP.
Using XLM-RoBERTa for Question Answering enables:
- Multilingual QA: Extract answers from text in various languages with a single model.
- Accurate Contextual Understanding: Leverage XLM-RoBERTa's deep understanding of context to provide precise answers.
- Cross-Domain Flexibility: Apply to different domains, from customer support to education, across languages.
Advantages of using XLM-RoBERTa for Question Answering in Spark NLP include:
- Scalability: Spark NLP is built on Apache Spark, ensuring efficient scaling for large datasets.
- Pretrained Excellence: Utilize state-of-the-art pretrained models to achieve high accuracy in question answering tasks.
- Multilingual Flexibility: XLM-RoBERTa’s multilingual capabilities make it suitable for global applications, reducing the need for language-specific models.
- Seamless Integration: Easily incorporate XLM-RoBERTa into your existing Spark pipelines for streamlined NLP workflows.
""", unsafe_allow_html=True)
st.markdown("""How to Use XLM-RoBERTa for Question Answering in Spark NLP
""", unsafe_allow_html=True)
st.markdown("""
To leverage XLM-RoBERTa for question answering, Spark NLP provides a user-friendly pipeline setup. The following example shows how to use XLM-RoBERTa for extracting answers from a given context based on a specific question. XLM-RoBERTa’s multilingual training enables it to perform question answering across various languages, making it an essential tool for global NLP tasks.
""", unsafe_allow_html=True)
# Code Example
st.code('''
from sparknlp.base import *
from sparknlp.annotator import *
from pyspark.ml import Pipeline
document_assembler = MultiDocumentAssembler() \\
.setInputCols(["question", "context"]) \\
.setOutputCols(["document_question", "document_context"])
spanClassifier = XlmRoBertaForQuestionAnswering.pretrained("xlm_roberta_qa_Part_1_XLM_Model_E1","en") \\
.setInputCols(["document_question", "document_context"]) \\
.setOutputCol("answer") \\
.setCaseSensitive(True)
pipeline = Pipeline().setStages([document_assembler, spanClassifier])
example = spark.createDataFrame([["What's my name?", "My name is Clara and I live in Berkeley."]]).toDF("question", "context")
result = pipeline.fit(example).transform(example)
result.select("answer.result").show(truncate=False)
''', language='python')
st.text("""
+-----------+
| result |
+-----------+
|[Clara] |
+-----------+
""")
# Model Info Section
st.markdown('Choosing the Right Model
', unsafe_allow_html=True)
st.markdown("""
The XLM-RoBERTa model used here is pretrained and fine-tuned for question answering tasks, providing high accuracy and multilingual support.
For more information about the model, visit the XLM-RoBERTa Model Hub.
""", unsafe_allow_html=True)
# References Section
st.markdown('References
', unsafe_allow_html=True)
st.markdown("""
""", unsafe_allow_html=True)
# Footer
st.markdown("""
- Official Website: Documentation and examples
- Slack: Live discussion with the community and team
- GitHub: Bug reports, feature requests, and contributions
- Medium: Spark NLP articles
- YouTube: Video tutorials
""", unsafe_allow_html=True)
st.markdown('Quick Links
', unsafe_allow_html=True)
st.markdown("""
""", unsafe_allow_html=True)