metadata
language: ru
tags:
- spam-detection
- text-classification
- russian
license: mit
datasets:
- RUSpam/spam_dataset_v4
metrics:
- F1
model-index:
- name: spam_deberta_v4
results:
- task:
name: Классификация текста
type: text-classification
dataset:
name: RUSpam/russian_spam_dataset
type: RUSpam/russian_spam_dataset
metrics:
- name: F1
type: F1
value: 0.9897
RUSpam/spam_deberta_v4
Описание
Это модель определения спама, основанная на архитектуре Deberta, дообученная на русскоязычных данных о спаме. Она классифицирует текст как спам или не спам.
Использование
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_path = "RUSpam/spam_deberta_v4"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSequenceClassification.from_pretrained(model_path)
def predict(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
predicted_class = torch.argmax(logits, dim=1).item()
return "Спам" if predicted_class == 1 else "Не спам"
text = "Ваш текст для проверки здесь"
result = predict(text)
print(f"Результат: {result}")
Цитирование
@MISC{RUSpam/spam_deberta_v4,
author = {Denis Petrov, Kirill Fedko (Neurospacex), Sergey Yalovegin},
title = {Russian Spam Classification Model},
url = {https://huggingface.co/RUSpam/spam_deberta_v4/},
year = 2024
}