File size: 1,148 Bytes
1e1d2d0 d52e556 091f367 d52e556 091f367 d52e556 091f367 1e1d2d0 091f367 9381286 091f367 8f3fc48 9381286 091f367 a203234 091f367 536b0cc faddfbf 536b0cc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
---
license: cc-by-sa-4.0
language:
- de
- en
- es
- fr
- it
- ja
- ru
- uk
tags:
- translation
---
# TakoMT
This is a translation model using Marian-NMT.
For more details, please see [my repository](https://github.com/s-taka/fugumt).
In addition to the data listed in the repository I also used [ParaCrawl](https://paracrawl.eu/).
* source languages: de, en, es, fr, it, ru, uk
* target language: ja
### How to use
This model uses transformers and sentencepiece.
```python
!pip install transformers sentencepiece
```
You can use this model directly with a pipeline:
```python
from transformers import pipeline
tako_translator = pipeline('translation', model='staka/takomt')
tako_translator('This is a cat.')
```
### Eval results
The results of the evaluation using [tatoeba](https://tatoeba.org/ja)(randomly selected 500 sentences) are as follows:
|source |target |BLEU(*1)|
|-------|-------|--------|
|de |ja |27.8 |
|en |ja |28.4 |
|es |ja |32.0 |
|fr |ja |27.9 |
|it |ja |24.3 |
|ru |ja |27.3 |
|uk |ja |29.8 |
(*1) sacrebleu --tokenize ja-mecab
|