π Winning model for the COLING 2025 Workshop on Detecting AI Generated Content (DAIGenC)
Model description
A binary classification model of machine-generated fragments that achieved first place on the monolingual subtask in the COLING 2025 GenAI Detection Task. The model is a fine-tuned version of DeBERTa-v3-base in multi-task mode with a shared encoder and three parallel heads for classification. Only head is used for inference.
Usage
import torch
from transformers import AutoTokenizer, DebertaV2ForSequenceClassification
class MLayerDebertaV2ForSequenceClassification(
DebertaV2ForSequenceClassification
):
def __init__(self, config, **kwargs):
super().__init__(config)
self.classifier = torch.nn.Sequential(
torch.nn.Linear(config.hidden_size, 512),
torch.nn.GELU(),
torch.nn.Linear(512, 256),
torch.nn.GELU(),
torch.nn.Dropout(0.5),
torch.nn.Linear(256, 2)
)
tokenizer = AutoTokenizer.from_pretrained(
"OU-Advacheck/deberta-v3-base-daigenc-mgt1a"
)
model = MLayerDebertaV2ForSequenceClassification.from_pretrained(
"OU-Advacheck/deberta-v3-base-daigenc-mgt1a"
)
model.eval()
inputs = tokenizer(
['Hello, Thanks for sharing your health concern with us. I have gone through your query and here are your answers: 1. If you have regular cycles, there is no further need to use any medication to regulate cycles. 2. Establishment of regular ovulation and timing of intercourse properly is necessary. 3. If you want to conceive quickly, you have to get further evaluation and plan management. Hope this helps.',
'He might have small intestinal TB rather than stomach TB. Amoebas also involves small intestine/some part of large intestine. If he has taken medicines for both diseases in form of a Complete Course, he should be fine. U can go for an oral+iv contrast CT scan of him. Now, the diagnosis of a lax cardiac can be confirmed by an upper GI endoscopy with manometry (if available). Lax cardiac may cause acidity with reflux.'],
max_length=512,
truncation=True,
padding="max_length",
return_tensors="pt"
)
torch.softmax(
model(**inputs)[0], dim=1
).detach().cpu()[:, 1].tolist()
Limitations and bias
This model is limited to a training dataset consisting of generated and human generated texts from different sources and domains over a period of time. It may not be a good fit for all use cases in different domains. In addition, the model may have false positives in some cases, which can be varied by the classification threshold.
Quality
Quality on the declaired test set in the competition (with 0.92 probability threshold).
Model | Main Score (F1 Macro) | Auxiliary Score (F1 Micro) |
---|---|---|
MTL DeBERTa-v3-base (our) | 0.8307 | 0.8311 |
Single-task DeBERTa-v30-base | 0.7852 | 0.7891 |
baseline | 0.7342 | 0.7343 |
Training procedure
This model was fine-tuned on train part of English version of the competition data MGT Detection Task 1 dataset. Class 0 - human
, 1 - machine
. Model was fine-tuned with 2 stages on a single NVIDIA RTX 3090 GPU with hyperparameters described in our paper.
Your Own Fine-Tune
If you would like to fine-tune this architecture on your data domains or base models, we offer you our learn and run code with all instructions, which we have posted on the GitHub.
Citation
If you use that results in your research, please cite our paper:
@misc{gritsai2024advacheckgenaidetectiontask,
title={Advacheck at GenAI Detection Task 1: AI Detection Powered by Domain-Aware Multi-Tasking},
author={German Gritsai and Anastasia Voznyuk and Ildar Khabutdinov and Andrey Grabovoy},
year={2024},
eprint={2411.11736},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2411.11736},
}
- Downloads last month
- 18
Model tree for OU-Advacheck/deberta-v3-base-daigenc-mgt1a
Base model
microsoft/deberta-v3-base