data-silence's picture
Update README.md
a9dfadf verified
metadata
language:
  - ru
library_name: fasttext
pipeline_tag: text-classification
tags:
  - news
  - media
  - russian
  - multilingual

FastText Text Classifier

This is a FastText model for text classification, trained on my news dataset, consisting of news from the last 5 years, hosted on Hugging Face Hub. The learning news dataset is a well-balanced sample of recent news from the last five years.

Model Description

This model uses FastText to classify text into 11 categories. It has been trained on ~70_000 examples and achieves an accuracy of 0.8691 on a test dataset.

Task

The model is designed to classify russian languages news articles into 11 categories.

Categories

The news category is assigned by the classifier to one of 11 categories:

  • climate (климат)
  • conflicts (конфликты)
  • culture (культура)
  • economy (экономика)
  • gloss (глянец)
  • health (здоровье)
  • politics (политика)
  • science (наука)
  • society (общество)
  • sports (спорт)
  • travel (путешествия) }

Intended uses & limitations

The "gloss" category is used to select yellow press, trashy and dubious news. The model can get confused in the classification of news categories politics, society and conflicts.

Usage

To use this model, you will need the fasttext and transformers libraries. Install them using pip:

pip install fasttext transformers

Example of how to use the model:

from huggingface_hub import hf_hub_download
import fasttext


class FastTextClassifierPipeline:
    def __init__(self, model_path):
        self.model = fasttext.load_model(model_path)

    def __call__(self, texts):
        if isinstance(texts, str):
            texts = [texts]

        results = []
        for text in texts:
            prediction = self.model.predict(text)
            label = prediction[0][0].replace("__label__", "")
            score = float(prediction[1][0])
            results.append({"label": label, "score": score})

        return results


def pipeline(task="text-classification", model=None):
    # Загрузка файла model.bin
    repo_id = "data-silence/fasttext-rus-news-classifier"
    model_file = hf_hub_download(repo_id=repo_id, filename="fasttext_news_classifier.bin")
    return FastTextClassifierPipeline(model_file)


# Создание классификатора
classifier = pipeline("text-classification")

# Использование классификатора
text = "В Париже завершилась церемония закрытия Олимпийских игр"
result = classifier(text)
print(result)
# [{'label': 'sports', 'score': 1.0000100135803223}]

Contacts

If you have any questions or suggestions for improving the model, please create an issue in this repository or contact me at [email protected].