File size: 2,894 Bytes
06b4b76 2a0e66f 06b4b76 2a0e66f 06b4b76 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
---
license: cc-by-nc-4.0
datasets:
- barbaroo/Sprotin_parallel
- barbaroo/fo_en_synthetic
language:
- en
- fo
metrics:
- bleu
- chrf
- bertscore
base_model:
- facebook/nllb-200-distilled-1.3B
pipeline_tag: translation
---
# barbaroo/nllb_200_1.3B_en_fo
## Model Description
- **Model Architecture**: This model is based on the [NLLB 1.3B architecture](https://huggingface.co/facebook/nllb-200-distilled-1.3B) and weights.
- **Languages**: This checkpoint is fine-tuned to translate from **English** (`en`) to **Faroese** (`fo`).
- **Size**: ~1.3B parameters.
- **Finetuning Datasets**:
- [Sprotin_parallel](https://huggingface.co/datasets/barbaroo/Sprotin_parallel)
- [fo_en_synthetic](https://huggingface.co/datasets/barbaroo/fo_en_synthetic)
- **Training Regime**: Trained until convergence (about 2 epochs).
- **License**: Inherits the original licenses of the [NLLB 1.3B model](https://huggingface.co/facebook/nllb-200-distilled-1.3B).
## Intended Use
- **Primary Use Case**: Translate text from English to Faroese.
- **Audience**: Researchers, developers, or anyone interested in Faroese language processing.
- **Usage Scenarios**:
- Building Faroese-English translation tools
- Language research and corpus analysis
- Synthetic data creation
> **Important**: While the model can produce fluent translations, it is not guaranteed to be perfectly accurate on all inputs. Users should verify critical or sensitive content through human experts.
## Metrics
- **Model performance measures**:
This model was evaluated using **BLEU**, **chrF** and **BERT-score** —metrics widely adopted by the machine translation community.
Additionally, human evaluation was performed by two human experts using the ESA framework on a small dataset (about 200 sentences) of English sentences from news outlets (BBC, CNN, Al Jazeera).
---
## Evaluation Data
- **Datasets**:
Flores-200 dataset is described in Section 4 of the NLLB paper/documentation.
- **Motivation**:
Flores-200 is currently the only machine translation benchmark available for Faroese.
## How to Use
Below is a simple usage example in Python with [Hugging Face Transformers](https://github.com/huggingface/transformers):
```python
from transformers import pipeline
model_name = "barbaroo/nllb_200_600M_en_fo"
translator = pipeline(
"translation",
model=model_name,
tokenizer=model_name,
src_lang="eng_Latn", # Language code for English
tgt_lang="fao_Latn" # Language code for Faroese
)
text = "Hello, how are you?"
translation = translator(text)
print(translation)
```
## Citation
If you use this model or find it helpful in your research, please cite: [COMING SOON]
## Contact
For questions, feedback, or collaboration inquiries, feel free to reach out:
- **Primary Contact**: < Barbara Scalvini/ [email protected] / [email protected] > |