File size: 2,023 Bytes
865e91b cd17e24 865e91b cd17e24 865e91b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 |
---
license: apache-2.0
language:
- km
metrics:
- accuracy
base_model:
- facebook/fasttext-km-vectors
pipeline_tag: text-classification
library_name: fasttext
---
**This is a fine-tuned version of the FastText KM model for sentiment analysis to classify khmer texts into 2 categories; Postive and Negative.**
- **Task**: Sentiment analysis (binary classification).
- **Languages Supported**: Khmer.
- **Intended Use Cases**:
- Analyzing customer reviews.
- Social media sentiment detection.
- **Limitations**:
- Performance may degrade on languages or domains not present in the training data.
- Does not handle sarcasm or highly ambiguous inputs well.
-
The model was evaluated on a test set of 400 samples, achieving the following performance:
- **Test Accuracy**: 81%
- **Precision**: 81%
- **Recall**: 81%
- **F1 Score**: 81%
Confusion Matrix:
| Predicted\Actual | Negative | Positive |
|-------------------|----------|----------|
| **Negative** | 165 | 44 |
| **Positive** | 31 | 160 |
The model supports a maximum sequence length of 512 tokens.
## How to Use
```python
from huggingface_hub import hf_hub_download
import fasttext
from khmernltk import word_tokenize
model = fasttext.load_model(hf_hub_download("tykea/khmer-fasttext-sentiment-analysis", "model.bin"))
def predict(text):
# Tokenize the text
tokens = word_tokenize(text)
# Join tokens back into a single string
tokenized_text = ' '.join(tokens)
# Make predictions
predictions = model.predict(tokenized_text)
# Map labels to human-readable format
label_mapping = {
'__label__0': 'negative',
'__label__1': 'positive'
}
# Get the predicted label
predicted_label = predictions[0][0]
# Map the predicted label
human_readable_label = label_mapping.get(predicted_label, 'unknown')
return human_readable_label
predict('αααααΈααΆααααα’αα·αααααΆααααααΆαααααααΆααααααα') |