Kumshe's picture
Update README.md
8f31997 verified
metadata
license: mit
datasets:
  - HausaNLP/NaijaSenti-Twitter
language:
  - ha
metrics:
  - accuracy
  - f1
  - precision
  - recall
base_model: google-bert/bert-base-cased
pipeline_tag: text-classification
library_name: transformers
tags:
  - NLP
  - sentiment-analysis
  - hausa

Model Name: Hausa Sentiment Analysis
Model ID: Kumshe/Hausa-sentiment-analysis
Language: Hausa


Model Description

This model is a BERT-based model fine-tuned for sentiment analysis in the Hausa language. It is trained to classify social media text into different sentiment categories: positive, negative, or neutral.

Intended Use

  • Primary Use Case: Sentiment analysis for Hausa social media content, such as tweets or Facebook posts.
  • Target Users: NLP researchers, businesses analyzing social media, and developers building sentiment analysis tools for Hausa language content.
  • Example Usage:
    from transformers import AutoTokenizer, AutoModelForSequenceClassification
    
    # Load the model and tokenizer
    tokenizer = AutoTokenizer.from_pretrained("Kumshe/Hausa-sentiment-analysis")
    model = AutoModelForSequenceClassification.from_pretrained("Kumshe/Hausa-sentiment-analysis")
    
    # Encode the input text
    inputs = tokenizer("Your Hausa text here", return_tensors="pt")
    
    # Get model predictions
    outputs = model(**inputs)
    

Model Architecture

  • Base Model: BERT (Bidirectional Encoder Representations from Transformers)
  • Pre-trained Model: bert-base-cased from Hugging Face Transformers library.
  • Fine-Tuned Model: Fine-tuned for 40 epochs on a Hausa sentiment dataset.

Training Data

  • Data Source: The model was trained on a dataset containing 35,000 examples from social media platforms such as Twitter and Facebook.
  • Data Split:
    • Training Set: 80% of the data
    • Validation Set: 20% of the data

Training Details

  • Number of Epochs: 40
  • Batch Size:
    • Per device training batch size: 32
    • Per device evaluation batch size: 64
  • Learning Rate Schedule: Warm-up steps: 10, Weight decay: 0.01
  • Optimizer: AdamW
  • Training Hardware: Trained on Kaggle using 2 NVIDIA T4 GPUs.

Evaluation Metrics

  • Evaluation Loss: 0.6265
  • Accuracy: 73.47%
  • F1 Score: 73.47%
  • Precision: 73.54%
  • Recall: 73.47%

Model Performance

The model performs well on the given dataset, achieving a balanced performance between precision, recall, and F1 score, making it suitable for general sentiment analysis tasks in Hausa language text.

Limitations

  • The model may not generalize well to other types of Hausa text outside of social media (e.g., formal writing or literature).
  • Performance may degrade on text containing slang or regional dialects not well-represented in the training data.
  • The model is biased towards the examples in the training dataset; biases in the data may affect predictions.

Ethical Considerations

  • Sentiment analysis models can potentially amplify biases present in the training data.
  • Use cautiously in sensitive applications to avoid unintended consequences.
  • Consider the impact on privacy and data protection laws, especially when analyzing social media content.

License

Citation

If you use this model in your work, please cite it as follows:

@misc{Kumshe2024HausaSentimentAnalysis,
  author = {Umar Muhammad Mustapha Kumshe},
  title = {Hausa Sentiment Analysis},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Kumshe/Hausa-sentiment-analysis}},
}

Contributions

This model was fine-tuned by Umar Muhammad Mustapha Kumshe. Feel free to contribute, provide feedback, or raise issues on the model repository.