TACO -- Twitter Arguments from COnversations

Introducing TACO, a baseline classification model built upon AutoModelForSequenceClassification, designed to identify tweets belonging to four distinct classes: Reason, Statement, Notification, and None. Tailored specifically for argument mining on Twitter, this baseline model is an evolution of the BERTweet-base architecture, which was originally pre-trained on Twitter data. Through fine-tuning with the TACO dataset, the baseline model acquires its name and excels in the extraction of Twitter Arguments from COnversations.

Class Semantics

The TACO framework revolves around the two key elements of an argument, as defined by the Cambridge Dictionary. It encodes inference as a guess that you make or an opinion that you form based on the information that you have, and it also leverages the definition of information as facts or details about a person, company, product, etc..

Taken together, the following classes of tweets can be identified by TACO:

  • Statement, which refers to unique cases where only the inference is presented as something that someone says or writes officially, or an action done to express an opinion.
  • Reason, which represents a full argument where the inference is based on direct information mentioned in the tweet, such as a source-reference or quotation, and thus reveals the author’s motivation to try to understand and to make judgments based on practical facts.
  • Notification, which refers to a tweet that limits itself to providing information, such as media channels promoting their latest articles.
  • None, a tweet that provides neither inference nor information.

In its entirety, TACO can classify the following hierarchy for tweets:

Argument Tree

Usage

Using this model becomes easy when you have transformers installed:

pip install - U transformers

Then you can use the model to generate tweet classifications like this:

from transformers import pipeline

pipe = pipeline("text-classification", model="TomatenMarc/TACO")
prediction = pipe("Huggingface is awesome")

print(prediction)
Notice: The tweets need to undergo preprocessing before classification.

Training

The final model underwent training using the entire shuffled ground truth dataset known as TACO, encompassing a total of 1734 tweets. This dataset showcases the distribution of topics as: #abortion (25.9%), #brexit (29.0%), #got (11.0%), #lotrrop (12.1%), #squidgame (12.7%), and #twittertakeover (9.3%). For training, we utilized SimpleTransformers.

Additionally, the category and class distribution of the dataset TACO is as follows:

Argument No-Argument
865 (49.88%) 869 (50.12%)
Reason Statement Notification None
581 (33.50%) 284 (16.38%) 500 (28.84%) 369 (21.28%)

Notice: Our training involved TACO to forecast class predictions, where the categories (Argument/No-Argument) represent class aggregations based on the inference component.

Dataloader

"data_loader": {
    "type": "torch.utils.data.dataloader.DataLoader",
    "args": {
        "batch_size": 8,
        "sampler": "torch.utils.data.sampler.RandomSampler"
    }
}

Parameters of the fit()-Method:

{
    "epochs": 5,
    "max_grad_norm": 1,
    "optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
    "optimizer_params": {
        "lr": 4e-05
    },
    "scheduler": "WarmupLinear",
    "warmup_steps": 66,
    "weight_decay": 0.06
}

Evaluation

We utilized a stratified 10-fold cross-validation approach to present TACO's performance. In doing so, we employed the identical data and parameters as outlined in the Training section. This involved training on k-1 splits and utilizing the kth split for making predictions.

In total, the TACO classifier performs as follows:

Classification

Precision Recall F1-Score Support
Reason 73.69% 75.22% 74.45% 581
Statement 54.37% 59.15% 56.66% 284
Notification 79.02% 77.60% 78.30% 500
None 83.87% 77.51% 80.56% 369
------------- ----------- --------- ---------- ---------
Accuracy 73.76% 1734
Macro Avg 72.74% 72.37% 72.49% 1734
Weighted Avg 74.23% 73.76% 73.95% 1734

Categorization

Precision Recall F1-Score Support
No-Argument 86.66% 82.97% 84.77% 869
Argument 83.59% 87.17% 85.34% 865
------------- ----------- --------- ---------- ---------
Accuracy 85.06% 1734
Macro Avg 85.13% 85.07% 85.06% 1734
Weighted Avg 85.13% 85.06% 85.06% 1734

Environmental Impact

Licensing

TACO © 2023 is licensed under CC BY-NC-SA 4.0

Citation

@inproceedings{feger-dietze-2024-taco,
    title = "{TACO} {--} {T}witter Arguments from {CO}nversations",
    author = "Feger, Marc  and
              Dietze, Stefan",
    editor = "Calzolari, Nicoletta  and
              Kan, Min-Yen  and
              Hoste, Veronique  and
              Lenci, Alessandro  and
              Sakti, Sakriani  and
              Xue, Nianwen",
              booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
    month = may,
    year = "2024",
    address = "Torino, Italia",
    publisher = "ELRA and ICCL",
    url = "https://aclanthology.org/2024.lrec-main.1349",
    pages = "15522--15529"
}
Downloads last month
26
Safetensors
Model size
135M params
Tensor type
I64
·
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for TomatenMarc/TACO

Finetuned
(240)
this model