sparknlp-t5-open-book-question-answering / pages /Workflow & Model Overview.py
abdullahmubeen10's picture
Upload 5 files
9429ca9 verified
import streamlit as st
# Page configuration
st.set_page_config(
layout="wide",
initial_sidebar_state="auto"
)
# Custom CSS for better styling
st.markdown("""
<style>
.main-title {
font-size: 36px;
color: #4A90E2;
font-weight: bold;
text-align: center;
}
.sub-title {
font-size: 24px;
color: #4A90E2;
margin-top: 20px;
}
.section {
background-color: #f9f9f9;
padding: 15px;
border-radius: 10px;
margin-top: 20px;
}
.section h2 {
font-size: 22px;
color: #4A90E2;
}
.section p, .section ul {
color: #666666;
}
.link {
color: #4A90E2;
text-decoration: none;
}
</style>
""", unsafe_allow_html=True)
# Title
st.markdown('<div class="main-title">Automatically Answer Questions (OPEN BOOK)</div>', unsafe_allow_html=True)
# Introduction Section
st.markdown("""
<div class="section">
<p>Open-book question answering is a task where a model generates answers based on provided text or documents. Unlike closed-book models, open-book models utilize external sources to produce responses, making them more accurate and versatile in scenarios where the input text provides essential context.</p>
<p>This page explores how to implement an open-book question-answering pipeline using state-of-the-art NLP techniques. We use a T5 Transformer model, which is well-suited for generating detailed answers by leveraging the information contained within the input text.</p>
</div>
""", unsafe_allow_html=True)
# T5 Transformer Overview
st.markdown('<div class="sub-title">Understanding the T5 Transformer for Open-Book QA</div>', unsafe_allow_html=True)
st.markdown("""
<div class="section">
<p>The T5 (Text-To-Text Transfer Transformer) model by Google excels in converting various NLP tasks into a unified text-to-text format. For open-book question answering, the model takes a question and relevant context as input, generating a detailed and contextually appropriate answer.</p>
<p>The T5 model's ability to utilize provided documents makes it especially powerful in applications where the accuracy of the response is enhanced by access to supporting information, such as research tools, educational applications, or any system where the input text contains critical data.</p>
</div>
""", unsafe_allow_html=True)
# Performance Section
st.markdown('<div class="sub-title">Performance and Benchmarks</div>', unsafe_allow_html=True)
st.markdown("""
<div class="section">
<p>In open-book settings, the T5 model has been benchmarked across various datasets, demonstrating its capability to generate accurate and comprehensive answers when given relevant context. Its performance has been particularly strong in tasks requiring a deep understanding of the input text to produce correct and context-aware responses.</p>
<p>Open-book T5 models are especially valuable in applications that require dynamic interaction with content, making them ideal for domains such as customer support, research, and educational technologies.</p>
</div>
""", unsafe_allow_html=True)
# Implementation Section
st.markdown('<div class="sub-title">Implementing Open-Book Question Answering</div>', unsafe_allow_html=True)
st.markdown("""
<div class="section">
<p>The following example demonstrates how to implement an open-book question answering pipeline using Spark NLP. The pipeline includes a document assembler and the T5 model to generate answers based on the input text.</p>
</div>
""", unsafe_allow_html=True)
st.code('''
from sparknlp.base import *
from sparknlp.annotator import *
from pyspark.ml import Pipeline
from pyspark.sql.functions import col, expr
document_assembler = DocumentAssembler()\\
.setInputCol("text")\\
.setOutputCol("documents")
t5 = T5Transformer()\\
.pretrained(model_name)\\
.setTask("question:")\\
.setMaxOutputLength(200)\\
.setInputCols(["documents"])\\
.setOutputCol("answers")
pipeline = Pipeline().setStages([document_assembler, t5])
data = spark.createDataFrame([["What is the impact of climate change on polar bears?"]]).toDF("text")
result = pipeline.fit(data).transform(data)
result.select("answers.result").show(truncate=False)
''', language='python')
# Example Output
st.text("""
+------------------------------------------------+
|answers.result |
+------------------------------------------------+
|Climate change significantly affects polar ... |
+------------------------------------------------+
""")
# Model Info Section
st.markdown('<div class="sub-title">Choosing the Right Model for Open-Book QA</div>', unsafe_allow_html=True)
st.markdown("""
<div class="section">
<p>When selecting a model for open-book question answering, it's important to consider the specific needs of your application. Below are some of the available models, each offering different strengths based on their transformer architecture:</p>
<ul>
<li><b>t5_base</b>: A versatile model that provides strong performance on question-answering tasks, ideal for applications requiring detailed answers.</li>
<li><b>t5_small</b>: A more lightweight variant of T5, suitable for applications where resource efficiency is crucial, though it may not be as accurate as larger models.</li>
<li><b>albert_qa_xxlarge_tweetqa</b>: Based on the ALBERT architecture, this model is fine-tuned for the TweetQA dataset, making it effective for answering questions in shorter text formats.</li>
<li><b>bert_qa_callmenicky_finetuned_squad</b>: A fine-tuned BERT model that offers a good balance between accuracy and computational efficiency, suitable for general-purpose QA tasks.</li>
<li><b>deberta_v3_xsmall_qa_squad2</b>: A smaller DeBERTa model, optimized for high accuracy on SQuAD2 while being resource-efficient, making it great for smaller deployments.</li>
<li><b>distilbert_base_cased_qa_squad2</b>: A distilled version of BERT, offering faster inference times with slightly reduced accuracy, suitable for environments with limited resources.</li>
<li><b>longformer_qa_large_4096_finetuned_triviaqa</b>: This model is particularly well-suited for open-book QA tasks involving long documents, as it can handle extended contexts effectively.</li>
<li><b>roberta_qa_roberta_base_squad2_covid</b>: A RoBERTa-based model fine-tuned for COVID-related QA, making it highly specialized for health-related domains.</li>
<li><b>roberta_qa_CV_Merge_DS</b>: Another RoBERTa model, fine-tuned on a diverse dataset, offering versatility across different domains and question types.</li>
<li><b>xlm_roberta_base_qa_squad2</b>: A multilingual model fine-tuned on SQuAD2, ideal for QA tasks across various languages.</li>
</ul>
<p>Among these models, <b>t5_base</b> and <b>longformer_qa_large_4096_finetuned_triviaqa</b> are highly recommended for their strong performance in generating accurate and contextually rich answers, especially in scenarios with long input texts. For faster responses with an emphasis on efficiency, <b>distilbert_base_cased_qa_squad2</b> and <b>deberta_v3_xsmall_qa_squad2</b> are excellent choices. Specialized tasks may benefit from models like <b>albert_qa_xxlarge_tweetqa</b> or <b>roberta_qa_roberta_base_squad2_covid</b>, depending on the domain.</p>
<p>Explore the available models on the <a class="link" href="https://sparknlp.org/models?annotator=T5Transformer" target="_blank">Spark NLP Models Hub</a> to find the one that best suits your needs.</p>
</div>
""", unsafe_allow_html=True)
# Footer
# References Section
st.markdown('<div class="sub-title">References</div>', unsafe_allow_html=True)
st.markdown("""
<div class="section">
<ul>
<li><a class="link" href="https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html" target="_blank">Google AI Blog</a>: Exploring Transfer Learning with T5</li>
<li><a class="link" href="https://sparknlp.org/models?annotator=T5Transformer" target="_blank">Spark NLP Model Hub</a>: Explore T5 models</li>
<li><a class="link" href="https://github.com/google-research/text-to-text-transfer-transformer" target="_blank">GitHub</a>: T5 Transformer repository</li>
<li><a class="link" href="https://arxiv.org/abs/1910.10683" target="_blank">T5 Paper</a>: Detailed insights from the developers</li>
</ul>
</div>
""", unsafe_allow_html=True)
st.markdown('<div class="sub-title">Community & Support</div>', unsafe_allow_html=True)
st.markdown("""
<div class="section">
<ul>
<li><a class="link" href="https://sparknlp.org/" target="_blank">Official Website</a>: Documentation and examples</li>
<li><a class="link" href="https://join.slack.com/t/spark-nlp/shared_invite/zt-198dipu77-L3UWNe_AJ8xqDk0ivmih5Q" target="_blank">Slack</a>: Live discussion with the community and team</li>
<li><a class="link" href="https://github.com/JohnSnowLabs/spark-nlp" target="_blank">GitHub</a>: Bug reports, feature requests, and contributions</li>
<li><a class="link" href="https://medium.com/spark-nlp" target="_blank">Medium</a>: Spark NLP articles</li>
<li><a class="link" href="https://www.youtube.com/channel/UCmFOjlpYEhxf_wJUDuz6xxQ/videos" target="_blank">YouTube</a>: Video tutorials</li>
</ul>
</div>
""", unsafe_allow_html=True)
st.markdown('<div class="sub-title">Quick Links</div>', unsafe_allow_html=True)
st.markdown("""
<div class="section">
<ul>
<li><a class="link" href="https://sparknlp.org/docs/en/quickstart" target="_blank">Getting Started</a></li>
<li><a class="link" href="https://nlp.johnsnowlabs.com/models" target="_blank">Pretrained Models</a></li>
<li><a class="link" href="https://github.com/JohnSnowLabs/spark-nlp/tree/master/examples/python/annotation/text/english" target="_blank">Example Notebooks</a></li>
<li><a class="link" href="https://sparknlp.org/docs/en/install" target="_blank">Installation Guide</a></li>
</ul>
</div>
""", unsafe_allow_html=True)