|
--- |
|
license: cc-by-nc-4.0 |
|
datasets: |
|
- barbaroo/Sprotin_parallel |
|
- barbaroo/fo_en_synthetic |
|
language: |
|
- en |
|
- fo |
|
metrics: |
|
- bleu |
|
- chrf |
|
- bertscore |
|
base_model: |
|
- facebook/nllb-200-distilled-600M |
|
pipeline_tag: translation |
|
--- |
|
# barbaroo/nllb_200_600M_en_fo |
|
|
|
## Model Description |
|
|
|
- **Model Architecture**: This model is based on the [NLLB 600M architecture](https://huggingface.co/facebook/nllb-200-distilled-600M) and weights. |
|
- **Languages**: This checkpoint is fine-tuned to translate from **English** (`en`) to **Faroese** (`fo`). |
|
- **Size**: ~600M parameters. |
|
- **Finetuning Datasets**: |
|
- [Sprotin_parallel](https://huggingface.co/datasets/barbaroo/Sprotin_parallel) |
|
- [fo_en_synthetic](https://huggingface.co/datasets/barbaroo/fo_en_synthetic) |
|
- **Training Regime**: Trained until convergence (about 2 epochs). |
|
- **License**: Inherits the original licenses of the [NLLB 600M model](https://huggingface.co/facebook/nllb-200-distilled-600M). |
|
|
|
## Intended Use |
|
|
|
- **Primary Use Case**: Translate text from English to Faroese. |
|
- **Audience**: Researchers, developers, or anyone interested in Faroese language processing. |
|
- **Usage Scenarios**: |
|
- Building Faroese-English translation tools |
|
- Language research and corpus analysis |
|
- Synthetic data creation |
|
|
|
> **Important**: While the model can produce fluent translations, it is not guaranteed to be perfectly accurate on all inputs. Users should verify critical or sensitive content through human experts. |
|
|
|
|
|
## Metrics |
|
|
|
- **Model performance measures**: |
|
NLLB-200 model was evaluated using **BLEU**, **chrF** and **BERT-score** —metrics widely adopted by the machine translation community. |
|
--- |
|
|
|
## Evaluation Data |
|
|
|
- **Datasets**: |
|
Flores-200 dataset is described in Section 4 of the NLLB paper/documentation. |
|
- **Motivation**: |
|
Flores-200 is currently the only machine translation benchmark available for Faroese. |
|
|
|
## How to Use |
|
|
|
Below is a simple usage example in Python with [Hugging Face Transformers](https://github.com/huggingface/transformers): |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline |
|
|
|
model_name = "barbaroo/nllb_200_600M_en_fo" |
|
translator = pipeline("translation", model=model_name, tokenizer=model_name) |
|
|
|
text = "Hello, how are you?" |
|
translation = translator(text) |
|
print(translation) |
|
``` |
|
|
|
## Citation |
|
|
|
If you use this model or find it helpful in your research, please cite: [COMING SOON] |
|
|
|
## Contact |
|
|
|
For questions, feedback, or collaboration inquiries, feel free to reach out: |
|
|
|
- **Primary Contact**: < Barbara Scalvini/ [email protected] / [email protected] > |
|
|
|
|
|
|