This is a fine-tuned version of the XLM-RoBERTa model for sentiment analysis to classify khmer texts into 2 categories; Postive and Negative.

It can process texts up to 512 tokens and performs well on khmer text inputs.

  • Task: Sentiment analysis (binary classification).

  • Languages Supported: Khmer.

  • Intended Use Cases:

    • Analyzing customer reviews.
    • Social media sentiment detection.
  • Limitations: - Performance may degrade on languages or domains not present in the training data. - Does not handle sarcasm or highly ambiguous inputs well.

    The model was evaluated on a test set of 400 samples, achieving the following performance:

  • Test Accuracy: 83.25%

  • Precision: 83.55%

  • Recall: 83.25%

  • F1 Score: 83.25%

Confusion Matrix:

Predicted\Actual Negative Positive
Negative 166 42
Positive 25 167
The model supports a maximum sequence length of 512 tokens.

How to Use

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("tykea/khmer-text-sentiment-analysis-roberta")
model = AutoModelForSequenceClassification.from_pretrained("tykea/khmer-text-sentiment-analysis-roberta")

text = "អគុណCADT"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
outputs = model(**inputs)
predictions = outputs.logits.argmax(dim=1)
labels_mapping = {0: 'negative', 1: 'positive'}
print("Predicted Class:", labels_mapping[predictions.item()])
Downloads last month
14
Safetensors
Model size
278M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for tykea/khmer-text-sentiment-analysis-roberta

Finetuned
(2698)
this model