File size: 2,023 Bytes
865e91b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cd17e24
865e91b
 
 
cd17e24
865e91b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
---
license: apache-2.0
language:
- km
metrics:
- accuracy
base_model:
- facebook/fasttext-km-vectors
pipeline_tag: text-classification
library_name: fasttext
---

**This is a fine-tuned version of the FastText KM model for sentiment analysis to classify khmer texts into 2 categories; Postive and Negative.** 

- **Task**: Sentiment analysis (binary classification).
- **Languages Supported**: Khmer.
- **Intended Use Cases**: 
  - Analyzing customer reviews.
  - Social media sentiment detection.
- **Limitations**: 
  - Performance may degrade on languages or domains not present in the training data.
  - Does not handle sarcasm or highly ambiguous inputs well.
  - 
The model was evaluated on a test set of 400 samples, achieving the following performance:

- **Test Accuracy**: 81%
- **Precision**: 81%
- **Recall**: 81%
- **F1 Score**: 81%

Confusion Matrix:
| Predicted\Actual | Negative | Positive |
|-------------------|----------|----------|
| **Negative**      | 165      | 44       |
| **Positive**      | 31       | 160      |
The model supports a maximum sequence length of 512 tokens.
## How to Use
```python
from huggingface_hub import hf_hub_download
import fasttext
from khmernltk import word_tokenize

model = fasttext.load_model(hf_hub_download("tykea/khmer-fasttext-sentiment-analysis", "model.bin"))

def predict(text):
    # Tokenize the text
    tokens = word_tokenize(text)
    # Join tokens back into a single string
    tokenized_text = ' '.join(tokens)
    # Make predictions
    predictions = model.predict(tokenized_text)
    # Map labels to human-readable format
    label_mapping = {
        '__label__0': 'negative',
        '__label__1': 'positive'
    }
    # Get the predicted label
    predicted_label = predictions[0][0]
    # Map the predicted label
    human_readable_label = label_mapping.get(predicted_label, 'unknown')
    return human_readable_label
predict('αž“αŸαŸ‡αž‚αžΈαž‡αžΆαž›αŸ’αž”αŸ‡αž’αžœαž·αž‡αŸ’αž‡αž˜αžΆαž“αžŸαž˜αŸ’αžšαžΆαž”αŸ‹αž”αŸ’αžšαž‡αžΆαž‡αž“αžαŸ’αž˜αŸ‚αžš')