File size: 2,894 Bytes
06b4b76
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2a0e66f
06b4b76
2a0e66f
 
 
 
 
 
 
 
 
06b4b76
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---
license: cc-by-nc-4.0
datasets:
- barbaroo/Sprotin_parallel
- barbaroo/fo_en_synthetic
language:
- en
- fo
metrics:
- bleu
- chrf
- bertscore
base_model:
- facebook/nllb-200-distilled-1.3B
pipeline_tag: translation
---
# barbaroo/nllb_200_1.3B_en_fo

## Model Description

- **Model Architecture**: This model is based on the [NLLB 1.3B architecture](https://huggingface.co/facebook/nllb-200-distilled-1.3B) and weights.  
- **Languages**: This checkpoint is fine-tuned to translate from **English** (`en`) to **Faroese** (`fo`).  
- **Size**: ~1.3B parameters.  
- **Finetuning Datasets**: 
  - [Sprotin_parallel](https://huggingface.co/datasets/barbaroo/Sprotin_parallel)  
  - [fo_en_synthetic](https://huggingface.co/datasets/barbaroo/fo_en_synthetic)
  - **Training Regime**: Trained until convergence (about 2 epochs).  
- **License**: Inherits the original licenses of the [NLLB 1.3B model](https://huggingface.co/facebook/nllb-200-distilled-1.3B).  

## Intended Use

- **Primary Use Case**: Translate text from English to Faroese.
- **Audience**: Researchers, developers, or anyone interested in Faroese language processing.
- **Usage Scenarios**:
  - Building Faroese-English translation tools  
  - Language research and corpus analysis
  - Synthetic data creation 

> **Important**: While the model can produce fluent translations, it is not guaranteed to be perfectly accurate on all inputs. Users should verify critical or sensitive content through human experts.


## Metrics

- **Model performance measures**:  
  This model was evaluated using **BLEU**, **chrF** and **BERT-score** —metrics widely adopted by the machine translation community.
  Additionally, human evaluation was performed by two human experts using the ESA framework on a small dataset (about 200 sentences) of English sentences from news outlets (BBC, CNN, Al Jazeera).
---

## Evaluation Data

- **Datasets**:  
  Flores-200 dataset is described in Section 4 of the NLLB paper/documentation.  
- **Motivation**:  
  Flores-200 is currently the only machine translation benchmark available for Faroese. 

  ## How to Use

Below is a simple usage example in Python with [Hugging Face Transformers](https://github.com/huggingface/transformers):

```python
from transformers import pipeline

model_name = "barbaroo/nllb_200_600M_en_fo"

translator = pipeline(
    "translation",
    model=model_name,
    tokenizer=model_name,
    src_lang="eng_Latn",   # Language code for English
    tgt_lang="fao_Latn"    # Language code for Faroese
)

text = "Hello, how are you?"
translation = translator(text)
print(translation)
```

## Citation

If you use this model or find it helpful in your research, please cite: [COMING SOON]

## Contact

For questions, feedback, or collaboration inquiries, feel free to reach out:

- **Primary Contact**: < Barbara Scalvini/ [email protected] / [email protected] >