---
language:
- en
- es
---

# UPB's Multi-task Learning model for AuTexTification

This is a model for classifying text as human- or LLM-generated.

This model was trained for one of University Politehnica of Bucharest's (UPB)
submissions to the [AuTexTification shared
task](https://sites.google.com/view/autextification/home).

This model was trained using multi-task learning to predict whether a text
document was written by a human or a large language model, and whether it was
written in English or Spanish.

The model outputs a score/probability for each task, but it also makes a binary
prediction for detecting synthetic text, based on a threshold.

## Training data

The model was trained on approximately 33,845 English documents and 32,062
Spanish documents, covering five different domains, such as legal or social
media. The dataset is available on Zenodo (more instructions
[here](https://sites.google.com/view/autextification/data)).

## Evaluation results

These results were computed as part of the [AuTexTification shared
task](https://sites.google.com/view/autextification/results):

| Language | Macro F1 | Confidence Interval|
|:---------|:--------:|:------------------:|
| English  | 65.53    | (64.92, 66.23)     |
| Spanish  | 65.01    | (64.58, 65.64)     |

## Using the model

You can load the model and its tokenizer using `AutoModel` and `AutoTokenizer`.

This is an example of using the model for inference:

```python
import torch
from transformers import AutoModel, AutoTokenizer

checkpoint = "pandrei7/autextification-upb-mtl"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModel.from_pretrained(checkpoint, trust_remote_code=True)

texts = ["Enter your text here."]
tokenized_batch = tokenizer(
    texts,
    padding=True,
    truncation=True,
    max_length=512,
    return_tensors="pt",
)

model.eval()
with torch.no_grad():
    preds = model(tokenized_batch)

print("Bot?\t", preds["is_bot"][0].item())
print("Bot score\t", preds["bot_prob"][0].item())
print("English score\t", preds["english_prob"][0].item())
```