tykea commited on
Commit
865e91b
β€’
1 Parent(s): d6c765c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -3
README.md CHANGED
@@ -1,3 +1,62 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - km
5
+ metrics:
6
+ - accuracy
7
+ base_model:
8
+ - facebook/fasttext-km-vectors
9
+ pipeline_tag: text-classification
10
+ library_name: fasttext
11
+ ---
12
+
13
+ **This is a fine-tuned version of the FastText KM model for sentiment analysis to classify khmer texts into 2 categories; Postive and Negative.**
14
+
15
+ - **Task**: Sentiment analysis (binary classification).
16
+ - **Languages Supported**: Khmer.
17
+ - **Intended Use Cases**:
18
+ - Analyzing customer reviews.
19
+ - Social media sentiment detection.
20
+ - **Limitations**:
21
+ - Performance may degrade on languages or domains not present in the training data.
22
+ - Does not handle sarcasm or highly ambiguous inputs well.
23
+ -
24
+ The model was evaluated on a test set of 400 samples, achieving the following performance:
25
+
26
+ - **Test Accuracy**: 81%
27
+ - **Precision**: 81%
28
+ - **Recall**: 81%
29
+ - **F1 Score**: 81%
30
+
31
+ Confusion Matrix:
32
+ | Predicted\Actual | Negative | Positive |
33
+ |-------------------|----------|----------|
34
+ | **Negative** | 165 | 44 |
35
+ | **Positive** | 31 | 160 |
36
+ The model supports a maximum sequence length of 512 tokens.
37
+ ## How to Use
38
+ ```python
39
+ import fasttext
40
+ from khmernltk import word_tokenize
41
+
42
+ # Load the model
43
+ model = fasttext.load_model('/Users/tykea/Desktop/fasttext-finetuned/sentiment_model.ftz')
44
+
45
+ def predict(text):
46
+ # Tokenize the text
47
+ tokens = word_tokenize(text)
48
+ # Join tokens back into a single string
49
+ tokenized_text = ' '.join(tokens)
50
+ # Make predictions
51
+ predictions = model.predict(tokenized_text)
52
+ # Map labels to human-readable format
53
+ label_mapping = {
54
+ '__label__0': 'negative',
55
+ '__label__1': 'positive'
56
+ }
57
+ # Get the predicted label
58
+ predicted_label = predictions[0][0]
59
+ # Map the predicted label
60
+ human_readable_label = label_mapping.get(predicted_label, 'unknown')
61
+ return human_readable_label
62
+ predict('αž“αŸαŸ‡αž‚αžΈαž‡αžΆαž›αŸ’αž”αŸ‡αž’αžœαž·αž‡αŸ’αž‡αž˜αžΆαž“αžŸαž˜αŸ’αžšαžΆαž”αŸ‹αž”αŸ’αžšαž‡αžΆαž‡αž“αžαŸ’αž˜αŸ‚αžš')