File size: 8,093 Bytes
cd3baae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
import streamlit as st

# Page configuration
st.set_page_config(
    layout="wide", 
    initial_sidebar_state="auto"
)

# Custom CSS for better styling
st.markdown("""

    <style>

        .main-title {

            font-size: 36px;

            color: #4A90E2;

            font-weight: bold;

            text-align: center;

        }

        .sub-title {

            font-size: 24px;

            color: #4A90E2;

            margin-top: 20px;

        }

        .section {

            background-color: #f9f9f9;

            padding: 15px;

            border-radius: 10px;

            margin-top: 20px;

        }

        .section h2 {

            font-size: 22px;

            color: #4A90E2;

        }

        .section p, .section ul {

            color: #666666;

        }

        .link {

            color: #4A90E2;

            text-decoration: none;

        }

        .benchmark-table {

            width: 100%;

            border-collapse: collapse;

            margin-top: 20px;

        }

        .benchmark-table th, .benchmark-table td {

            border: 1px solid #ddd;

            padding: 8px;

            text-align: left;

        }

        .benchmark-table th {

            background-color: #4A90E2;

            color: white;

        }

        .benchmark-table td {

            background-color: #f2f2f2;

        }

    </style>

""", unsafe_allow_html=True)

# Title
st.markdown('<div class="main-title">Introduction to XLM-RoBERTa Annotators in Spark NLP</div>', unsafe_allow_html=True)

# Subtitle
st.markdown("""

<div class="section">

    <p>XLM-RoBERTa (Cross-lingual Robustly Optimized BERT Approach) is an advanced multilingual model that extends the capabilities of RoBERTa to over 100 languages. Pre-trained on a massive, diverse corpus, XLM-RoBERTa is designed to handle various NLP tasks in a multilingual context, making it ideal for applications that require cross-lingual understanding. Below, we provide an overview of the XLM-RoBERTa annotators for these tasks:</p>

</div>

""", unsafe_allow_html=True)

# XLM-RoBERTa for Question Answering
st.markdown("""<div class="sub-title">Question Answering with XLM-RoBERTa</div>""", unsafe_allow_html=True)
st.markdown("""

<div class="section">

    <p>Question answering (QA) is a crucial task in Natural Language Processing (NLP) where the goal is to extract an answer from a given context in response to a specific question.</p>

    <p><strong>XLM-RoBERTa</strong> excels in question answering tasks across multiple languages, making it an invaluable tool for global applications. Below is an example of how to implement question answering using XLM-RoBERTa in Spark NLP.</p>

    <p>Using XLM-RoBERTa for Question Answering enables:</p>

    <ul>

        <li><strong>Multilingual QA:</strong> Extract answers from text in various languages with a single model.</li>

        <li><strong>Accurate Contextual Understanding:</strong> Leverage XLM-RoBERTa's deep understanding of context to provide precise answers.</li>

        <li><strong>Cross-Domain Flexibility:</strong> Apply to different domains, from customer support to education, across languages.</li>

    </ul>

    <p>Advantages of using XLM-RoBERTa for Question Answering in Spark NLP include:</p>

    <ul>

        <li><strong>Scalability:</strong> Spark NLP is built on Apache Spark, ensuring efficient scaling for large datasets.</li>

        <li><strong>Pretrained Excellence:</strong> Utilize state-of-the-art pretrained models to achieve high accuracy in question answering tasks.</li>

        <li><strong>Multilingual Flexibility:</strong> XLM-RoBERTa’s multilingual capabilities make it suitable for global applications, reducing the need for language-specific models.</li>

        <li><strong>Seamless Integration:</strong> Easily incorporate XLM-RoBERTa into your existing Spark pipelines for streamlined NLP workflows.</li>

    </ul>

</div>

""", unsafe_allow_html=True)

st.markdown("""<div class="sub-title">How to Use XLM-RoBERTa for Question Answering in Spark NLP</div>""", unsafe_allow_html=True)
st.markdown("""

<div class="section">

<p>To leverage XLM-RoBERTa for question answering, Spark NLP provides a user-friendly pipeline setup. The following example shows how to use XLM-RoBERTa for extracting answers from a given context based on a specific question. XLM-RoBERTa’s multilingual training enables it to perform question answering across various languages, making it an essential tool for global NLP tasks.</p>

</div>

""", unsafe_allow_html=True)

# Code Example
st.code('''

from sparknlp.base import *

from sparknlp.annotator import *

from pyspark.ml import Pipeline



document_assembler = MultiDocumentAssembler() \\

    .setInputCols(["question", "context"]) \\

    .setOutputCols(["document_question", "document_context"])



spanClassifier = XlmRoBertaForQuestionAnswering.pretrained("xlm_roberta_qa_Part_1_XLM_Model_E1","en") \\

    .setInputCols(["document_question", "document_context"]) \\

    .setOutputCol("answer") \\

    .setCaseSensitive(True)



pipeline = Pipeline().setStages([document_assembler, spanClassifier])



example = spark.createDataFrame([["What's my name?", "My name is Clara and I live in Berkeley."]]).toDF("question", "context")



result = pipeline.fit(example).transform(example)

result.select("answer.result").show(truncate=False)

''', language='python')

st.text("""

+-----------+

|   result  |

+-----------+

|[Clara]    |

+-----------+

""")

# Model Info Section
st.markdown('<div class="sub-title">Choosing the Right Model</div>', unsafe_allow_html=True)
st.markdown("""

<div class="section">

    <p>The XLM-RoBERTa model used here is pretrained and fine-tuned for question answering tasks, providing high accuracy and multilingual support.</p>

    <p>For more information about the model, visit the <a class="link" href="https://huggingface.co/xlm-roberta-base" target="_blank">XLM-RoBERTa Model Hub</a>.</p>

</div>

""", unsafe_allow_html=True)

# References Section
st.markdown('<div class="sub-title">References</div>', unsafe_allow_html=True)
st.markdown("""

<div class="section">

    <ul>

        <li><a class="link" href="https://arxiv.org/abs/1911.02116" target="_blank">XLM-R: Cross-lingual Pre-training</a></li>

        <li><a class="link" href="https://huggingface.co/xlm-roberta-base" target="_blank">XLM-RoBERTa Model Overview</a></li>

    </ul>

</div>

""", unsafe_allow_html=True)

# Footer
st.markdown("""

<div class="section">

    <ul>

        <li><a class="link" href="https://sparknlp.org/" target="_blank">Official Website</a>: Documentation and examples</li>

        <li><a class="link" href="https://join.slack.com/t/spark-nlp/shared_invite/zt-198dipu77-L3UWNe_AJ8xqDk0ivmih5Q" target="_blank">Slack</a>: Live discussion with the community and team</li>

        <li><a class="link" href="https://github.com/JohnSnowLabs/spark-nlp" target="_blank">GitHub</a>: Bug reports, feature requests, and contributions</li>

        <li><a class="link" href="https://medium.com/spark-nlp" target="_blank">Medium</a>: Spark NLP articles</li>

        <li><a class="link" href="https://www.youtube.com/channel/UCmFOjlpYEhxf_wJUDuz6xxQ/videos" target="_blank">YouTube</a>: Video tutorials</li>

    </ul>

</div>

""", unsafe_allow_html=True)

st.markdown('<div class="sub-title">Quick Links</div>', unsafe_allow_html=True)

st.markdown("""

<div class="section">

    <ul>

        <li><a class="link" href="https://sparknlp.org/docs/en/quickstart" target="_blank">Getting Started</a></li>

        <li><a class="link" href="https://nlp.johnsnowlabs.com/models" target="_blank">Pretrained Models</a></li>

        <li><a class="link" href="https://github.com/JohnSnowLabs/spark-nlp/tree/master/examples/python/annotation/text/english" target="_blank">Example Notebooks</a></li>

        <li><a class="link" href="https://sparknlp.org/docs/en/install" target="_blank">Installation Guide</a></li>

    </ul>

</div>

""", unsafe_allow_html=True)