🏆 Winning model for the COLING 2025 Workshop on Detecting AI Generated Content (DAIGenC)

Model description

A binary classification model of machine-generated fragments that achieved first place on the monolingual subtask in the COLING 2025 GenAI Detection Task. The model is a fine-tuned version of DeBERTa-v3-base in multi-task mode with a shared encoder and three parallel heads for classification. Only head is used for inference.

Usage

import torch
from transformers import AutoTokenizer, DebertaV2ForSequenceClassification

class MLayerDebertaV2ForSequenceClassification(
    DebertaV2ForSequenceClassification
):
    def __init__(self, config, **kwargs):
        super().__init__(config)
        self.classifier = torch.nn.Sequential(
            torch.nn.Linear(config.hidden_size, 512),
            torch.nn.GELU(),
            torch.nn.Linear(512, 256),
            torch.nn.GELU(),
            torch.nn.Dropout(0.5),
            torch.nn.Linear(256, 2)
        )

tokenizer = AutoTokenizer.from_pretrained(
    "OU-Advacheck/deberta-v3-base-daigenc-mgt1a"
)
model = MLayerDebertaV2ForSequenceClassification.from_pretrained(
    "OU-Advacheck/deberta-v3-base-daigenc-mgt1a"
)
model.eval()

inputs = tokenizer(
    ['Hello, Thanks for sharing your health concern with us. I have gone through your query and here are your answers: 1. If you have regular cycles, there is no further need to use any medication to regulate cycles. 2. Establishment of regular ovulation and timing of intercourse properly is necessary. 3. If you want to conceive quickly, you have to get further evaluation and plan management. Hope this helps.',
     'He might have small intestinal TB rather than stomach TB. Amoebas also involves small intestine/some part of large intestine. If he has taken medicines for both diseases in form of a Complete Course, he should be fine. U can go for an oral+iv contrast CT scan of him. Now, the diagnosis of a lax cardiac can be confirmed by an upper GI endoscopy with manometry (if available). Lax cardiac may cause acidity with reflux.'],
    max_length=512,
    truncation=True,
    padding="max_length",
    return_tensors="pt"
)

torch.softmax(
    model(**inputs)[0], dim=1
).detach().cpu()[:, 1].tolist()

Limitations and bias

This model is limited to a training dataset consisting of generated and human generated texts from different sources and domains over a period of time. It may not be a good fit for all use cases in different domains. In addition, the model may have false positives in some cases, which can be varied by the classification threshold.

Quality

Quality on the declaired test set in the competition (with 0.92 probability threshold).

Model	Main Score (F1 Macro)	Auxiliary Score (F1 Micro)
MTL DeBERTa-v3-base (our)	0.8307	0.8311
Single-task DeBERTa-v30-base	0.7852	0.7891
baseline	0.7342	0.7343

Training procedure

This model was fine-tuned on train part of English version of the competition data MGT Detection Task 1 dataset. Class 0 - human, 1 - machine. Model was fine-tuned with 2 stages on a single NVIDIA RTX 3090 GPU with hyperparameters described in our paper.

Your Own Fine-Tune

If you would like to fine-tune this architecture on your data domains or base models, we offer you our learn and run code with all instructions, which we have posted on the GitHub.

Citation

If you use that results in your research, please cite our paper:

@misc{gritsai2024advacheckgenaidetectiontask,
      title={Advacheck at GenAI Detection Task 1: AI Detection Powered by Domain-Aware Multi-Tasking}, 
      author={German Gritsai and Anastasia Voznyuk and Ildar Khabutdinov and Andrey Grabovoy},
      year={2024},
      eprint={2411.11736},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2411.11736}, 
}

OU-Advacheck
/

deberta-v3-base-daigenc-mgt1a