pipeline_tag: text-classification
tags:
- sentiment-analysis
- Moroccan-Darija
- MSA
base_model: CAMeL-Lab/bert-base-arabic-camelbert-da-sentiment
metrics:
- Accuracy
- Precision
- Recall
- F1-Score
language:
- ar
Fine-tuned CAMeL-BERT Model for Sentiment Analysis in Moroccan Darija
Model Name: CAMeL-BERT Fine-Tuned for Moroccan Darija Sentiment Analysis
Model ID: NerdyPy/fine_tuned_model_sentiment_analysis
Language: Arabic (Modern Standard Arabic and Moroccan Darija)
Task: Sentiment Analysis (Negative, Neutral, Positive)
Model Description
This model is a fine-tuned version of the CAMeL-Lab BERT model, specifically adapted for sentiment analysis in Moroccan Darija, a highly under-resourced Arabic dialect. The model has been trained to classify Arabic text—including both Modern Standard Arabic (MSA) and Moroccan Darija—into three sentiment categories:
- Negative
- Neutral
- Positive
By focusing on Moroccan Darija, this model addresses the scarcity of NLP resources for this dialect, enhancing sentiment analysis capabilities in mixed-language contexts common in Moroccan user-generated content.
Intended Use
Primary Use Case
- Sentiment analysis of user-generated content, such as comments and reviews, in Moroccan Darija and MSA.
Applications
- Analyzing public opinion on social media platforms and electronic journals.
- Assisting researchers in understanding societal attitudes and trends.
- Supporting policymakers and organizations in gauging public sentiment.
Users
- Researchers and data scientists in NLP.
- Organizations analyzing Arabic-language social media.
- Developers building sentiment analysis tools for Arabic dialects.
Limitations and Risks
Dialectal Variations
- Performance may vary on other Arabic dialects not represented in the training data.
Data Bias
- The model may reflect biases present in the training datasets.
Language Mixing (Code-Switching)
The model may face challenges when processing text that heavily mixes Moroccan Darija with other languages (e.g., French, English, Spanish). This could affect the accuracy of sentiment classification in such cases. For example: "واش كتفهم le français؟" In this sentence, the speaker switches from Moroccan Darija to French within the same sentence. The model, primarily trained on Arabic text, may not accurately interpret the sentiment due to unfamiliarity with the non-Arabic portion.
Generalization
- Limited performance on topics or vocabulary outside the training data.
How to Use
You can use this model with the Hugging Face Transformers library:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("NerdyPy/fine_tuned_model_sentiment_analysis")
model = AutoModelForSequenceClassification.from_pretrained("NerdyPy/fine_tuned_model_sentiment_analysis")
# Example text in Arabic
text = "العمل في هذا المكان كان رائعاً، ولكن شي مرات ما كاينش التنظيم"