--- license: apache-2.0 metrics: - bleu base_model: - facebook/mbart-large-cc25 pipeline_tag: translation --- # Moroccan Darija to English Translation Model (Fine-Tuned mBART) This model is a fine-tuned version of the mBART model, designed specifically for the Moroccan Darija to English translation task. It is based on Facebook's mBART, a multilingual model capable of handling various language pairs. This fine-tuned model has been trained on a Moroccan Darija dataset to perform accurate translations from Darija to English. ## Model Overview - **Model Type**: mBART (Multilingual BART) - **Language Pair**: Moroccan Darija → English - **Task**: Machine Translation - **Training Dataset**: The model was fine-tuned on a custom dataset containing Moroccan Darija to English translation pairs. ## Model Details The mBART model is a transformer-based sequence-to-sequence model, designed to handle multiple languages. It is particularly useful for tasks such as translation, text generation, and summarization. For this specific task, the model has been fine-tuned to accurately translate text from **Moroccan Darija** to **English**, making it suitable for applications involving the translation of conversational and informal text from Morocco. ## Intended Use This model can be used to: - Translate sentences from Moroccan Darija to English. ## How to Use the Model You can easily load the model and tokenizer using the Hugging Face `transformers` library. Here's an example: ```python from transformers import MBartForConditionalGeneration, MBart50TokenizerFast # Load the pre-trained model and tokenizer model_name = 'echarif/mBART_for_darija_transaltion' model = MBartForConditionalGeneration.from_pretrained(model_name) tokenizer = MBart50TokenizerFast.from_pretrained(model_name) # Prepare your input text (Moroccan Darija) input_text = "insert your Moroccan Darija sentence here" # Tokenize the input text inputs = tokenizer(input_text, return_tensors="pt", padding=True) # Generate the translated output translated_tokens = model.generate(**inputs) translated_text = tokenizer.decode(translated_tokens[0], skip_special_tokens=True) print(f"Translated Text: {translated_text}")