Sentiment Analysis with Fine-tuned Multilingual BERT for Georgian ๐Ÿ‡ฌ๐Ÿ‡ช

๐Ÿ“„ Model Overview

This is a fine-tuned BERT model for Georgian sentiment analysis, based on bert-base-multilingual-cased. The model was trained using the Georgian Sentiment Analysis dataset.

  • Base Model: bert-base-multilingual-cased
  • Fine-tuned on: Arseniy-Sandalov/Georgian-Sentiment-Analysis
  • Task: Sentiment classification (positive, negative, neutral)
  • Tokenizer: BERT multilingual cased tokenizer
  • License: Check dataset source

๐Ÿ‘‰ Usage Example

You can load and use this model with Hugging Face Transformers:

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

model_name = "Arseniy-Sandalov/GeorgianBert-Sent"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

def predict_sentiment(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
    with torch.no_grad():
        outputs = model(**inputs)
    prediction = torch.argmax(outputs.logits, dim=1).item()
    return ["negative", "neutral", "positive"][prediction]

text = "แƒแƒฎแƒแƒšแƒ˜ แƒ›แƒ”แƒแƒ แƒ˜ แƒ™แƒแƒ แƒ’แƒ˜แƒ แƒ”แƒ แƒ—แƒ˜แƒšแƒ"
print(predict_sentiment(text))

๐Ÿ“Š Training Details

Dataset Preprocessing:

  • Removed irrelevant columns (e.g., perturbation)

  • Stratified split: 80% train, 10% validation, 10% test

Evaluation Metric:

  • ROC AUC Score (computed on validation & test sets)

๐Ÿ“– Citation

If you use this model, please cite the original dataset:

@misc {Stefanovitch2023Sentiment,
  author = {Stefanovitch, Nicolas and Piskorski, Jakub and Kharazi, Sopho},
  title = {Sentiment analysis for Georgian},
  year = {2023},
  publisher = {European Commission, Joint Research Centre (JRC)},
  howpublished = {\url{http://data.europa.eu/89h/9f04066a-8cc0-4669-99b4-f1f0627fdbbf}},
  url = {http://data.europa.eu/89h/9f04066a-8cc0-4669-99b4-f1f0627fdbbf},
  type = {dataset},
  note = {PID: http://data.europa.eu/89h/9f04066a-8cc0-4669-99b4-f1f0627fdbbf}
}
Downloads last month
7
Safetensors
Model size
178M params
Tensor type
F32
ยท
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for Arseniy-Sandalov/GeorgianBert-Sent

Finetuned
(659)
this model

Dataset used to train Arseniy-Sandalov/GeorgianBert-Sent