import streamlit as st # Custom CSS for better styling st.markdown(""" """, unsafe_allow_html=True) # Main Title st.markdown('
Detect Time-related Terminology
', unsafe_allow_html=True) # Description st.markdown("""

Detect Time-related Terminology is a crucial NLP task that involves identifying and classifying key temporal entities in text. This app leverages the roberta_token_classifier_timex_semeval model, which has been imported from Hugging Face and trained to detect time-related terminology using RoBERTa embeddings and RobertaForTokenClassification for NER purposes.

""", unsafe_allow_html=True) # What is NER st.markdown('
What is Named Entity Recognition (NER)?
', unsafe_allow_html=True) st.markdown("""

Named Entity Recognition (NER) is a process in Natural Language Processing (NLP) that locates and classifies named entities into predefined categories such as dates, times, periods, and other temporal expressions. For example, in the sentence "Model training was started at 22:12C and it took 3 days from Tuesday to Friday," NER helps identify '22:12C' as a time period, '3 days' as a calendar interval, and 'Tuesday' and 'Friday' as days of the week.

NER models are trained to understand the context and semantics of entities within text, enabling automated systems to recognize and categorize these entities accurately. This capability is essential for developing intelligent systems capable of processing and responding to user queries efficiently.

""", unsafe_allow_html=True) # Predicted Entities st.markdown('
Predicted Entities
', unsafe_allow_html=True) st.markdown("""
""", unsafe_allow_html=True) # How to Use the Model st.markdown('
How to Use the Model
', unsafe_allow_html=True) st.markdown("""

To use this model, follow these steps in Python:

""", unsafe_allow_html=True) st.code(''' from sparknlp.base import * from sparknlp.annotator import * from pyspark.ml import Pipeline from pyspark.sql.functions import col, expr, round, concat, lit # Define the components of the pipeline document_assembler = DocumentAssembler() \\ .setInputCol("text") \\ .setOutputCol("document") sentence_detector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "en") \\ .setInputCols(["document"]) \\ .setOutputCol("sentence") tokenizer = Tokenizer() \\ .setInputCols(["sentence"]) \\ .setOutputCol("token") token_classifier = RoBertaForTokenClassification.pretrained("roberta_token_classifier_timex_semeval", "en") \\ .setInputCols(["sentence", "token"]) \\ .setOutputCol("ner") ner_converter = NerConverter() \\ .setInputCols(["sentence", "token", "ner"]) \\ .setOutputCol("ner_chunk") # Create the pipeline pipeline = Pipeline(stages=[ document_assembler, sentence_detector, tokenizer, token_classifier, ner_converter ]) # Create some example data text = "Model training was started at 22:12C and it took 3 days from Tuesday to Friday." data = spark.createDataFrame([[text]]).toDF("text") # Apply the pipeline to the data model = pipeline.fit(data) result = model.transform(data) # Select the result, entity result.select( expr("explode(ner_chunk) as ner_chunk") ).select( col("ner_chunk.result").alias("chunk"), col("ner_chunk.metadata.entity").alias("entity") ).show(truncate=False) ''', language='python') # Results st.text(""" +-------+-----------------+ |chunk |entity | +-------+-----------------+ |took |Frequency | |3 |Number | |days |Calendar-Interval| |Tuesday|Day-Of-Week | |to |Between | |Friday |Day-Of-Week | +-------+-----------------+ """) # Model Information st.markdown('
Model Information
', unsafe_allow_html=True) st.markdown("""
Model Name roberta_token_classifier_timex_semeval
Compatibility Spark NLP 3.3.4+
License Open Source
Edition Official
Input Labels [sentence, token]
Output Labels [ner]
Language en
Size 439.5 MB
Case sensitive true
Max sentence length 256
""", unsafe_allow_html=True) # Data Source st.markdown('
Data Source
', unsafe_allow_html=True) st.markdown("""

For more information about the dataset used to train this model, visit the Hugging Face page.

""", unsafe_allow_html=True) # Conclusion st.markdown('
Conclusion
', unsafe_allow_html=True) st.markdown("""

Detecting time-related terminology is essential for a wide range of applications. This model, leveraging RoBERTa embeddings and RobertaForTokenClassification, provides robust capabilities for identifying and classifying temporal entities within text.

By integrating this model into your systems, you can enhance scheduling, event tracking, historical data analysis, and more. The high accuracy and comprehensive coverage of time-related entities make this model a valuable tool for many applications.

""", unsafe_allow_html=True) # References st.markdown('
References
', unsafe_allow_html=True) st.markdown("""
""", unsafe_allow_html=True) # Community & Support st.markdown('
Community & Support
', unsafe_allow_html=True) st.markdown("""
""", unsafe_allow_html=True)