language: | |
- de | |
- en | |
- es | |
- fr | |
- it | |
- ja | |
- ru | |
- uk | |
- multilingual | |
license: cc-by-sa-4.0 | |
tags: | |
- translation | |
# TakoMT | |
This is a translation model using Marian-NMT. | |
For more details, please see [my repository](https://github.com/s-taka/fugumt). | |
In addition to the data listed in the repository I also used [ParaCrawl](https://paracrawl.eu/). | |
* source languages: de, en, es, fr, it, ru, uk | |
* target language: ja | |
### How to use | |
This model uses transformers and sentencepiece. | |
```python | |
!pip install transformers sentencepiece | |
``` | |
You can use this model directly with a pipeline: | |
```python | |
from transformers import pipeline | |
tako_translator = pipeline('translation', model='staka/takomt') | |
tako_translator('This is a cat.') | |
``` | |
### Eval results | |
The results of the evaluation using [tatoeba](https://tatoeba.org/ja)(randomly selected 500 sentences) are as follows: | |
|source |target |BLEU(*1)| | |
|-------|-------|--------| | |
|de |ja |27.8 | | |
|en |ja |28.4 | | |
|es |ja |32.0 | | |
|fr |ja |27.9 | | |
|it |ja |24.3 | | |
|ru |ja |27.3 | | |
|uk |ja |29.8 | | |
(*1) sacrebleu --tokenize ja-mecab | |