README.md · Kumshe/Hausa-sentiment-analysis at main

metadata

license: mit
datasets:
  - HausaNLP/NaijaSenti-Twitter
language:
  - ha
metrics:
  - accuracy
  - f1
  - precision
  - recall
base_model: google-bert/bert-base-cased
pipeline_tag: text-classification
library_name: transformers
tags:
  - NLP
  - sentiment-analysis
  - hausa

Model Name: Hausa Sentiment Analysis
Model ID: Kumshe/Hausa-sentiment-analysis
Language: Hausa

Model Description

This model is a BERT-based model fine-tuned for sentiment analysis in the Hausa language. It is trained to classify social media text into different sentiment categories: positive, negative, or neutral.

Intended Use

Primary Use Case: Sentiment analysis for Hausa social media content, such as tweets or Facebook posts.
Target Users: NLP researchers, businesses analyzing social media, and developers building sentiment analysis tools for Hausa language content.

Example Usage:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Kumshe/Hausa-sentiment-analysis")
model = AutoModelForSequenceClassification.from_pretrained("Kumshe/Hausa-sentiment-analysis")

# Encode the input text
inputs = tokenizer("Your Hausa text here", return_tensors="pt")

# Get model predictions
outputs = model(**inputs)

Model Architecture

Base Model: BERT (Bidirectional Encoder Representations from Transformers)
Pre-trained Model: bert-base-cased from Hugging Face Transformers library.
Fine-Tuned Model: Fine-tuned for 40 epochs on a Hausa sentiment dataset.

Training Data

Data Source: The model was trained on a dataset containing 35,000 examples from social media platforms such as Twitter and Facebook.
Data Split:
- Training Set: 80% of the data
- Validation Set: 20% of the data

Training Details

Number of Epochs: 40
Batch Size:
- Per device training batch size: 32
- Per device evaluation batch size: 64
Learning Rate Schedule: Warm-up steps: 10, Weight decay: 0.01
Optimizer: AdamW
Training Hardware: Trained on Kaggle using 2 NVIDIA T4 GPUs.

Evaluation Metrics

Evaluation Loss: 0.6265
Accuracy: 73.47%
F1 Score: 73.47%
Precision: 73.54%
Recall: 73.47%

Model Performance

The model performs well on the given dataset, achieving a balanced performance between precision, recall, and F1 score, making it suitable for general sentiment analysis tasks in Hausa language text.

Limitations

The model may not generalize well to other types of Hausa text outside of social media (e.g., formal writing or literature).
Performance may degrade on text containing slang or regional dialects not well-represented in the training data.
The model is biased towards the examples in the training dataset; biases in the data may affect predictions.

Ethical Considerations

Sentiment analysis models can potentially amplify biases present in the training data.
Use cautiously in sensitive applications to avoid unintended consequences.
Consider the impact on privacy and data protection laws, especially when analyzing social media content.

License

Citation

If you use this model in your work, please cite it as follows:

@misc{Kumshe2024HausaSentimentAnalysis,
  author = {Umar Muhammad Mustapha Kumshe},
  title = {Hausa Sentiment Analysis},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Kumshe/Hausa-sentiment-analysis}},
}

Contributions

This model was fine-tuned by Umar Muhammad Mustapha Kumshe. Feel free to contribute, provide feedback, or raise issues on the model repository.