|
--- |
|
license: apache-2.0 |
|
language: |
|
- km |
|
metrics: |
|
- accuracy |
|
base_model: |
|
- facebook/fasttext-km-vectors |
|
pipeline_tag: text-classification |
|
library_name: fasttext |
|
--- |
|
|
|
**This is a fine-tuned version of the FastText KM model for sentiment analysis to classify khmer texts into 2 categories; Postive and Negative.** |
|
|
|
- **Task**: Sentiment analysis (binary classification). |
|
- **Languages Supported**: Khmer. |
|
- **Intended Use Cases**: |
|
- Analyzing customer reviews. |
|
- Social media sentiment detection. |
|
- **Limitations**: |
|
- Performance may degrade on languages or domains not present in the training data. |
|
- Does not handle sarcasm or highly ambiguous inputs well. |
|
- |
|
The model was evaluated on a test set of 400 samples, achieving the following performance: |
|
|
|
- **Test Accuracy**: 81% |
|
- **Precision**: 81% |
|
- **Recall**: 81% |
|
- **F1 Score**: 81% |
|
|
|
Confusion Matrix: |
|
| Predicted\Actual | Negative | Positive | |
|
|-------------------|----------|----------| |
|
| **Negative** | 165 | 44 | |
|
| **Positive** | 31 | 160 | |
|
The model supports a maximum sequence length of 512 tokens. |
|
## How to Use |
|
```python |
|
from huggingface_hub import hf_hub_download |
|
import fasttext |
|
from khmernltk import word_tokenize |
|
|
|
model = fasttext.load_model(hf_hub_download("tykea/khmer-fasttext-sentiment-analysis", "model.bin")) |
|
|
|
def predict(text): |
|
# Tokenize the text |
|
tokens = word_tokenize(text) |
|
# Join tokens back into a single string |
|
tokenized_text = ' '.join(tokens) |
|
# Make predictions |
|
predictions = model.predict(tokenized_text) |
|
# Map labels to human-readable format |
|
label_mapping = { |
|
'__label__0': 'negative', |
|
'__label__1': 'positive' |
|
} |
|
# Get the predicted label |
|
predicted_label = predictions[0][0] |
|
# Map the predicted label |
|
human_readable_label = label_mapping.get(predicted_label, 'unknown') |
|
return human_readable_label |
|
predict('αααααΈααΆααααα’αα·αααααΆααααααΆαααααααΆααααααα') |