This is a fine-tuned version of the XLM-RoBERTa model for sentiment analysis to classify khmer texts into 2 categories; Postive and Negative.
It can process texts up to 512 tokens and performs well on khmer text inputs.
Task: Sentiment analysis (binary classification).
Languages Supported: Khmer.
Intended Use Cases:
- Analyzing customer reviews.
- Social media sentiment detection.
Limitations: - Performance may degrade on languages or domains not present in the training data. - Does not handle sarcasm or highly ambiguous inputs well.
The model was evaluated on a test set of 400 samples, achieving the following performance:
Test Accuracy: 83.25%
Precision: 83.55%
Recall: 83.25%
F1 Score: 83.25%
Confusion Matrix:
Predicted\Actual | Negative | Positive |
---|---|---|
Negative | 166 | 42 |
Positive | 25 | 167 |
The model supports a maximum sequence length of 512 tokens. |
How to Use
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("tykea/khmer-text-sentiment-analysis-roberta")
model = AutoModelForSequenceClassification.from_pretrained("tykea/khmer-text-sentiment-analysis-roberta")
text = "អគុណCADT"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
outputs = model(**inputs)
predictions = outputs.logits.argmax(dim=1)
labels_mapping = {0: 'negative', 1: 'positive'}
print("Predicted Class:", labels_mapping[predictions.item()])
- Downloads last month
- 14
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for tykea/khmer-text-sentiment-analysis-roberta
Base model
FacebookAI/xlm-roberta-base