Model Card for Model ID
Given a sentence, our model predicts whether or not the sentence contains "persuasive" language, or language designed to elicit emotions or change readers' opinions. The model was tuned on the SemEval 2020 Task 11 dataset. However, we preprocessed the dataset to adapt it from multilabel technique classification and span-classification to our binary classification task.
There are two revisions:
- BERT - we finetuned
bert-large-cased
on our main branch - XLM-RoBERTa - we finetuned
xlm-roberta-base
on ourroberta
branch.
Model Details
Model Description
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
- Developed by: Ultraviolet Text
- Model type: BERT / RoBERTa
- Language(s) (NLP): En
- License: MIT
- Finetuned from model [optional]: bert-large-cased / xlm-roberta-base
How to Get Started with the Model
Use the code below to get started with the model.
Loading from the main branch (BERT)
from transformers import AutoModelForSequenceClassification, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-large-cased")
model = AutoModelForSequenceClassification.from_pretrained("chreh/persuasive_language_detector")
Loading from the roberta
branch (XLM RoBERTa)
from transformers import AutoModelForSequenceClassification, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-base")
model = AutoModelForSequenceClassification.from_pretrained("chreh/persuasive_language_detector", revision="roberta")
Training Details
Training Data
Training data can be downloaded from the Semeval website.
Training Procedure
The training was done using Huggingface Trainer on both our local machines and Intel Developer Cloud kernels, enabling us to prototype multiple models simultaneously.
Preprocessing [optional]
All sentences containing spans of persuasive language techniques were labeled as persuasive language examples, while all others were labeled as examples of non-persuasive language.
Testing Data, Factors & Metrics
Testing Data
The test data is from the test data of sem_eval_2020_task_11
, which can be downloaded from the original website.
The test data contains 38.25% persuasive examples and non-persuasive examples 61.75%. Metrics can be found in the following section
Metrics
Metrics are reported in the format (main_branch), (roberta branch)
- Accuracy - 0.7165140725669719, 0.7326693227091633
- Recall - 0.6875584658559402, 0.6822916666666666
- Precision - 0.5941794664510913, 0.6415279138099902
- F1 - 0.6374674761491761, 0.6612821807168097
Overall, the roberta
branch performs better, and with faster inference times. Thus, we recommend users download from the roberta
revision.
- Downloads last month
- 160