--- license: cc-by-nc-4.0 datasets: - barbaroo/Sprotin_parallel - barbaroo/fo_en_synthetic language: - en - fo metrics: - bleu - chrf - bertscore base_model: - facebook/nllb-200-distilled-600M pipeline_tag: translation --- # barbaroo/nllb_200_600M_en_fo ## Model Description - **Model Architecture**: This model is based on the [NLLB 600M architecture](https://huggingface.co/facebook/nllb-200-distilled-600M) and weights. - **Languages**: This checkpoint is fine-tuned to translate from **English** (`en`) to **Faroese** (`fo`). - **Size**: ~600M parameters. - **Finetuning Datasets**: - [Sprotin_parallel](https://huggingface.co/datasets/barbaroo/Sprotin_parallel) - [fo_en_synthetic](https://huggingface.co/datasets/barbaroo/fo_en_synthetic) - **Training Regime**: Trained until convergence (about 2 epochs). - **License**: Inherits the original licenses of the [NLLB 600M model](https://huggingface.co/facebook/nllb-200-distilled-600M). ## Intended Use - **Primary Use Case**: Translate text from English to Faroese. - **Audience**: Researchers, developers, or anyone interested in Faroese language processing. - **Usage Scenarios**: - Building Faroese-English translation tools - Language research and corpus analysis - Synthetic data creation > **Important**: While the model can produce fluent translations, it is not guaranteed to be perfectly accurate on all inputs. Users should verify critical or sensitive content through human experts. ## Metrics - **Model performance measures**: NLLB-200 model was evaluated using **BLEU**, **chrF** and **BERT-score** —metrics widely adopted by the machine translation community. --- ## Evaluation Data - **Datasets**: Flores-200 dataset is described in Section 4 of the NLLB paper/documentation. - **Motivation**: Flores-200 is currently the only machine translation benchmark available for Faroese. ## How to Use Below is a simple usage example in Python with [Hugging Face Transformers](https://github.com/huggingface/transformers): ```python from transformers import pipeline model_name = "barbaroo/nllb_200_600M_en_fo" translator = pipeline( "translation", model=model_name, tokenizer=model_name, src_lang="eng_Latn", # Language code for English tgt_lang="fao_Latn" # Language code for Faroese ) text = "Hello, how are you?" translation = translator(text) print(translation) ``` ## Citation If you use this model or find it helpful in your research, please cite: [COMING SOON] ## Contact For questions, feedback, or collaboration inquiries, feel free to reach out: - **Primary Contact**: < Barbara Scalvini/ barbaras@setur.fo / barbaralongview@gmail.com >