--- license: apache-2.0 language: - km metrics: - accuracy base_model: - facebook/fasttext-km-vectors pipeline_tag: text-classification library_name: fasttext --- **This is a fine-tuned version of the FastText KM model for sentiment analysis to classify khmer texts into 2 categories; Postive and Negative.** - **Task**: Sentiment analysis (binary classification). - **Languages Supported**: Khmer. - **Intended Use Cases**: - Analyzing customer reviews. - Social media sentiment detection. - **Limitations**: - Performance may degrade on languages or domains not present in the training data. - Does not handle sarcasm or highly ambiguous inputs well. - The model was evaluated on a test set of 400 samples, achieving the following performance: - **Test Accuracy**: 81% - **Precision**: 81% - **Recall**: 81% - **F1 Score**: 81% Confusion Matrix: | Predicted\Actual | Negative | Positive | |-------------------|----------|----------| | **Negative** | 165 | 44 | | **Positive** | 31 | 160 | The model supports a maximum sequence length of 512 tokens. ## How to Use ```python from huggingface_hub import hf_hub_download import fasttext from khmernltk import word_tokenize model = fasttext.load_model(hf_hub_download("tykea/khmer-fasttext-sentiment-analysis", "model.bin")) def predict(text): # Tokenize the text tokens = word_tokenize(text) # Join tokens back into a single string tokenized_text = ' '.join(tokens) # Make predictions predictions = model.predict(tokenized_text) # Map labels to human-readable format label_mapping = { '__label__0': 'negative', '__label__1': 'positive' } # Get the predicted label predicted_label = predictions[0][0] # Map the predicted label human_readable_label = label_mapping.get(predicted_label, 'unknown') return human_readable_label predict('នេះគីជាល្បះអវិជ្ជមានសម្រាប់ប្រជាជនខ្មែរ')