|
--- |
|
pipeline_tag: text-classification |
|
tags: |
|
- sentiment-analysis |
|
- Moroccan-Darija |
|
- MSA |
|
base_model: CAMeL-Lab/bert-base-arabic-camelbert-da-sentiment |
|
metrics: |
|
- Accuracy |
|
- Precision |
|
- Recall |
|
- F1-Score |
|
language: |
|
- ar |
|
--- |
|
|
|
# **Fine-tuned CAMeL-BERT Model for Sentiment Analysis in Moroccan Darija** |
|
|
|
**Model Name:** CAMeL-BERT Fine-Tuned for Moroccan Darija Sentiment Analysis |
|
**Model ID:** `NerdyPy/fine_tuned_model_sentiment_analysis` |
|
**Language:** Arabic (Modern Standard Arabic and Moroccan Darija) |
|
**Task:** Sentiment Analysis (Negative, Neutral, Positive) |
|
|
|
--- |
|
|
|
## **Model Description** |
|
|
|
This model is a fine-tuned version of the [CAMeL-Lab BERT](https://huggingface.co/CAMeL-Lab/bert-base-arabic-camelbert-da-sentiment) model, specifically adapted for sentiment analysis in **Moroccan Darija**, a highly under-resourced Arabic dialect. The model has been trained to classify Arabic text—including both Modern Standard Arabic (MSA) and Moroccan Darija—into three sentiment categories: |
|
|
|
- **Negative** |
|
- **Neutral** |
|
- **Positive** |
|
|
|
By focusing on Moroccan Darija, this model addresses the scarcity of NLP resources for this dialect, enhancing sentiment analysis capabilities in mixed-language contexts common in Moroccan user-generated content. |
|
|
|
--- |
|
|
|
## **Intended Use** |
|
|
|
### **Primary Use Case** |
|
|
|
- Sentiment analysis of user-generated content, such as comments and reviews, in Moroccan Darija and MSA. |
|
|
|
### **Applications** |
|
|
|
- Analyzing public opinion on social media platforms and electronic journals. |
|
- Assisting researchers in understanding societal attitudes and trends. |
|
- Supporting policymakers and organizations in gauging public sentiment. |
|
|
|
### **Users** |
|
|
|
- Researchers and data scientists in NLP. |
|
- Organizations analyzing Arabic-language social media. |
|
- Developers building sentiment analysis tools for Arabic dialects. |
|
|
|
--- |
|
|
|
## **Limitations and Risks** |
|
|
|
### **Dialectal Variations** |
|
|
|
- **Performance may vary on other Arabic dialects not represented in the training data.** |
|
|
|
### **Data Bias** |
|
|
|
- **The model may reflect biases present in the training datasets.** |
|
|
|
### **Language Mixing (Code-Switching)** |
|
|
|
The model may face challenges when processing text that heavily mixes Moroccan Darija with other languages (e.g., French, English, Spanish). This could affect the accuracy of sentiment classification in such cases. For example: |
|
**"واش كتفهم le français؟"** In this sentence, the speaker switches from Moroccan Darija to French within the same sentence. The model, primarily trained on Arabic text, may not accurately interpret the sentiment due to unfamiliarity with the non-Arabic portion. |
|
|
|
### **Generalization** |
|
|
|
- **Limited performance on topics or vocabulary outside the training data.** |
|
|
|
--- |
|
|
|
## **How to Use** |
|
|
|
You can use this model with the Hugging Face Transformers library: |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
|
# Load the tokenizer and model |
|
tokenizer = AutoTokenizer.from_pretrained("NerdyPy/fine_tuned_model_sentiment_analysis") |
|
model = AutoModelForSequenceClassification.from_pretrained("NerdyPy/fine_tuned_model_sentiment_analysis") |
|
|
|
# Example text in Arabic |
|
text = "العمل في هذا المكان كان رائعاً، ولكن شي مرات ما كاينش التنظيم" |