metadata
license: mit
datasets:
- HausaNLP/NaijaSenti-Twitter
language:
- ha
metrics:
- accuracy
- f1
- precision
- recall
base_model: google-bert/bert-base-cased
pipeline_tag: text-classification
library_name: transformers
tags:
- NLP
- sentiment-analysis
- hausa
Model Name: Hausa Sentiment Analysis
Model ID: Kumshe/Hausa-sentiment-analysis
Language: Hausa
Model Description
This model is a BERT-based model fine-tuned for sentiment analysis in the Hausa language. It is trained to classify social media text into different sentiment categories: positive, negative, or neutral.
Intended Use
- Primary Use Case: Sentiment analysis for Hausa social media content, such as tweets or Facebook posts.
- Target Users: NLP researchers, businesses analyzing social media, and developers building sentiment analysis tools for Hausa language content.
- Example Usage:
from transformers import AutoTokenizer, AutoModelForSequenceClassification # Load the model and tokenizer tokenizer = AutoTokenizer.from_pretrained("Kumshe/Hausa-sentiment-analysis") model = AutoModelForSequenceClassification.from_pretrained("Kumshe/Hausa-sentiment-analysis") # Encode the input text inputs = tokenizer("Your Hausa text here", return_tensors="pt") # Get model predictions outputs = model(**inputs)
Model Architecture
- Base Model: BERT (Bidirectional Encoder Representations from Transformers)
- Pre-trained Model:
bert-base-cased
from Hugging Face Transformers library. - Fine-Tuned Model: Fine-tuned for 40 epochs on a Hausa sentiment dataset.
Training Data
- Data Source: The model was trained on a dataset containing 35,000 examples from social media platforms such as Twitter and Facebook.
- Data Split:
- Training Set: 80% of the data
- Validation Set: 20% of the data
Training Details
- Number of Epochs: 40
- Batch Size:
- Per device training batch size: 32
- Per device evaluation batch size: 64
- Learning Rate Schedule: Warm-up steps: 10, Weight decay: 0.01
- Optimizer: AdamW
- Training Hardware: Trained on Kaggle using 2 NVIDIA T4 GPUs.
Evaluation Metrics
- Evaluation Loss: 0.6265
- Accuracy: 73.47%
- F1 Score: 73.47%
- Precision: 73.54%
- Recall: 73.47%
Model Performance
The model performs well on the given dataset, achieving a balanced performance between precision, recall, and F1 score, making it suitable for general sentiment analysis tasks in Hausa language text.
Limitations
- The model may not generalize well to other types of Hausa text outside of social media (e.g., formal writing or literature).
- Performance may degrade on text containing slang or regional dialects not well-represented in the training data.
- The model is biased towards the examples in the training dataset; biases in the data may affect predictions.
Ethical Considerations
- Sentiment analysis models can potentially amplify biases present in the training data.
- Use cautiously in sensitive applications to avoid unintended consequences.
- Consider the impact on privacy and data protection laws, especially when analyzing social media content.
License
Citation
If you use this model in your work, please cite it as follows:
@misc{Kumshe2024HausaSentimentAnalysis,
author = {Umar Muhammad Mustapha Kumshe},
title = {Hausa Sentiment Analysis},
year = {2024},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/Kumshe/Hausa-sentiment-analysis}},
}
Contributions
This model was fine-tuned by Umar Muhammad Mustapha Kumshe. Feel free to contribute, provide feedback, or raise issues on the model repository.