Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,53 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
metrics:
|
4 |
+
- bleu
|
5 |
+
base_model:
|
6 |
+
- facebook/mbart-large-cc25
|
7 |
+
pipeline_tag: translation
|
8 |
+
---
|
9 |
+
# Moroccan Darija to English Translation Model (Fine-Tuned mBART)
|
10 |
+
|
11 |
+
This model is a fine-tuned version of the mBART model, designed specifically for the Moroccan Darija to English translation task. It is based on Facebook's mBART, a multilingual model capable of handling various language pairs. This fine-tuned model has been trained on a Moroccan Darija dataset to perform accurate translations from Darija to English.
|
12 |
+
|
13 |
+
## Model Overview
|
14 |
+
|
15 |
+
- **Model Type**: mBART (Multilingual BART)
|
16 |
+
- **Language Pair**: Moroccan Darija → English
|
17 |
+
- **Task**: Machine Translation
|
18 |
+
- **Training Dataset**: The model was fine-tuned on a custom dataset containing Moroccan Darija to English translation pairs.
|
19 |
+
|
20 |
+
## Model Details
|
21 |
+
|
22 |
+
The mBART model is a transformer-based sequence-to-sequence model, designed to handle multiple languages. It is particularly useful for tasks such as translation, text generation, and summarization.
|
23 |
+
|
24 |
+
For this specific task, the model has been fine-tuned to accurately translate text from **Moroccan Darija** to **English**, making it suitable for applications involving the translation of conversational and informal text from Morocco.
|
25 |
+
|
26 |
+
## Intended Use
|
27 |
+
|
28 |
+
This model can be used to:
|
29 |
+
- Translate sentences from Moroccan Darija to English.
|
30 |
+
|
31 |
+
## How to Use the Model
|
32 |
+
|
33 |
+
You can easily load the model and tokenizer using the Hugging Face `transformers` library. Here's an example:
|
34 |
+
|
35 |
+
```python
|
36 |
+
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
|
37 |
+
|
38 |
+
# Load the pre-trained model and tokenizer
|
39 |
+
model_name = 'echarif/mBART_for_darija_transaltion'
|
40 |
+
model = MBartForConditionalGeneration.from_pretrained(model_name)
|
41 |
+
tokenizer = MBart50TokenizerFast.from_pretrained(model_name)
|
42 |
+
|
43 |
+
# Prepare your input text (Moroccan Darija)
|
44 |
+
input_text = "insert your Moroccan Darija sentence here"
|
45 |
+
|
46 |
+
# Tokenize the input text
|
47 |
+
inputs = tokenizer(input_text, return_tensors="pt", padding=True)
|
48 |
+
|
49 |
+
# Generate the translated output
|
50 |
+
translated_tokens = model.generate(**inputs)
|
51 |
+
translated_text = tokenizer.decode(translated_tokens[0], skip_special_tokens=True)
|
52 |
+
|
53 |
+
print(f"Translated Text: {translated_text}")
|