nllb_200_600M_en_fo / README.md
barbaroo's picture
Update README.md
b6b80f0 verified
---
license: cc-by-nc-4.0
datasets:
- barbaroo/Sprotin_parallel
- barbaroo/fo_en_synthetic
language:
- en
- fo
metrics:
- bleu
- chrf
- bertscore
base_model:
- facebook/nllb-200-distilled-600M
pipeline_tag: translation
---
# barbaroo/nllb_200_600M_en_fo
## Model Description
- **Model Architecture**: This model is based on the [NLLB 600M architecture](https://huggingface.co/facebook/nllb-200-distilled-600M) and weights.
- **Languages**: This checkpoint is fine-tuned to translate from **English** (`en`) to **Faroese** (`fo`).
- **Size**: ~600M parameters.
- **Finetuning Datasets**:
- [Sprotin_parallel](https://huggingface.co/datasets/barbaroo/Sprotin_parallel)
- [fo_en_synthetic](https://huggingface.co/datasets/barbaroo/fo_en_synthetic)
- **Training Regime**: Trained until convergence (about 2 epochs).
- **License**: Inherits the original licenses of the [NLLB 600M model](https://huggingface.co/facebook/nllb-200-distilled-600M).
## Intended Use
- **Primary Use Case**: Translate text from English to Faroese.
- **Audience**: Researchers, developers, or anyone interested in Faroese language processing.
- **Usage Scenarios**:
- Building Faroese-English translation tools
- Language research and corpus analysis
- Synthetic data creation
> **Important**: While the model can produce fluent translations, it is not guaranteed to be perfectly accurate on all inputs. Users should verify critical or sensitive content through human experts.
## Metrics
- **Model performance measures**:
NLLB-200 model was evaluated using **BLEU**, **chrF** and **BERT-score** —metrics widely adopted by the machine translation community.
---
## Evaluation Data
- **Datasets**:
Flores-200 dataset is described in Section 4 of the NLLB paper/documentation.
- **Motivation**:
Flores-200 is currently the only machine translation benchmark available for Faroese.
## How to Use
Below is a simple usage example in Python with [Hugging Face Transformers](https://github.com/huggingface/transformers):
```python
from transformers import pipeline
model_name = "barbaroo/nllb_200_600M_en_fo"
translator = pipeline(
"translation",
model=model_name,
tokenizer=model_name,
src_lang="eng_Latn", # Language code for English
tgt_lang="fao_Latn" # Language code for Faroese
)
text = "Hello, how are you?"
translation = translator(text)
print(translation)
```
## Citation
If you use this model or find it helpful in your research, please cite: [COMING SOON]
## Contact
For questions, feedback, or collaboration inquiries, feel free to reach out:
- **Primary Contact**: < Barbara Scalvini/ [email protected] / [email protected] >