tykea
/

khmer-fasttext-sentiment-analysis

Text Classification

Model card Files Files and versions Community

khmer-fasttext-sentiment-analysis / README.md

tykea's picture

Update README.md

cd17e24 verified 9 days ago

|

history blame contribute delete

2.02 kB

	---
	license: apache-2.0
	language:
	- km
	metrics:
	- accuracy
	base_model:
	- facebook/fasttext-km-vectors
	pipeline_tag: text-classification
	library_name: fasttext
	---

	This is a fine-tuned version of the FastText KM model for sentiment analysis to classify khmer texts into 2 categories; Postive and Negative.

	- Task: Sentiment analysis (binary classification).
	- Languages Supported: Khmer.
	- Intended Use Cases:
	- Analyzing customer reviews.
	- Social media sentiment detection.
	- Limitations:
	- Performance may degrade on languages or domains not present in the training data.
	- Does not handle sarcasm or highly ambiguous inputs well.
	-
	The model was evaluated on a test set of 400 samples, achieving the following performance:

	- Test Accuracy: 81%
	- Precision: 81%
	- Recall: 81%
	- F1 Score: 81%

	Confusion Matrix:
	\| Predicted\Actual \| Negative \| Positive \|
	\|-------------------\|----------\|----------\|
	\| Negative \| 165 \| 44 \|
	\| Positive \| 31 \| 160 \|
	The model supports a maximum sequence length of 512 tokens.
	## How to Use
	```python
	from huggingface_hub import hf_hub_download
	import fasttext
	from khmernltk import word_tokenize

	model = fasttext.load_model(hf_hub_download("tykea/khmer-fasttext-sentiment-analysis", "model.bin"))

	def predict(text):
	# Tokenize the text
	tokens = word_tokenize(text)
	# Join tokens back into a single string
	tokenized_text = ' '.join(tokens)
	# Make predictions
	predictions = model.predict(tokenized_text)
	# Map labels to human-readable format
	label_mapping = {
	'__label__0': 'negative',
	'__label__1': 'positive'
	}
	# Get the predicted label
	predicted_label = predictions[0][0]
	# Map the predicted label
	human_readable_label = label_mapping.get(predicted_label, 'unknown')
	return human_readable_label
	predict('នេះគីជាល្បះអវិជ្ជមានសម្រាប់ប្រជាជនខ្មែរ')