echarif
/

mBART_for_darija_transaltion

Model card Files Files and versions Community

echarif commited on Feb 4

Commit

024351f

·

verified ·

1 Parent(s): 60b522d

Update README.md

Files changed (1) hide show

README.md +53 -3

README.md CHANGED Viewed

@@ -1,3 +1,53 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+metrics:
+- bleu
+base_model:
+- facebook/mbart-large-cc25
+pipeline_tag: translation
+---
+# Moroccan Darija to English Translation Model (Fine-Tuned mBART)
+This model is a fine-tuned version of the mBART model, designed specifically for the Moroccan Darija to English translation task. It is based on Facebook's mBART, a multilingual model capable of handling various language pairs. This fine-tuned model has been trained on a Moroccan Darija dataset to perform accurate translations from Darija to English.
+## Model Overview
+- **Model Type**: mBART (Multilingual BART)
+- **Language Pair**: Moroccan Darija → English
+- **Task**: Machine Translation
+- **Training Dataset**: The model was fine-tuned on a custom dataset containing Moroccan Darija to English translation pairs.
+## Model Details
+The mBART model is a transformer-based sequence-to-sequence model, designed to handle multiple languages. It is particularly useful for tasks such as translation, text generation, and summarization.
+For this specific task, the model has been fine-tuned to accurately translate text from **Moroccan Darija** to **English**, making it suitable for applications involving the translation of conversational and informal text from Morocco.
+## Intended Use
+This model can be used to:
+- Translate sentences from Moroccan Darija to English.
+## How to Use the Model
+You can easily load the model and tokenizer using the Hugging Face `transformers` library. Here's an example:
+```python
+from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
+# Load the pre-trained model and tokenizer
+model_name = 'echarif/mBART_for_darija_transaltion'
+model = MBartForConditionalGeneration.from_pretrained(model_name)
+tokenizer = MBart50TokenizerFast.from_pretrained(model_name)
+# Prepare your input text (Moroccan Darija)
+input_text = "insert your Moroccan Darija sentence here"
+# Tokenize the input text
+inputs = tokenizer(input_text, return_tensors="pt", padding=True)
+# Generate the translated output
+translated_tokens = model.generate(**inputs)
+translated_text = tokenizer.decode(translated_tokens[0], skip_special_tokens=True)
+print(f"Translated Text: {translated_text}")