Edit model card

Japanese to English translator

Japanese to English translator model based on EncoderDecoderModel(bert-japanese+GPT2)

Usage

Demo

Please visit https://huggingface.co/spaces/sappho192/jesc-ja-en-translator-demo

Dependencies (PyPI)

  • torch
  • transformers
  • fugashi
  • unidic-lite

Inference

import transformers
import torch

encoder_model_name = "cl-tohoku/bert-base-japanese-v2"
decoder_model_name = "openai-community/gpt2"
src_tokenizer = transformers.BertJapaneseTokenizer.from_pretrained(encoder_model_name)
trg_tokenizer = transformers.PreTrainedTokenizerFast.from_pretrained(decoder_model_name)
model = transformers.EncoderDecoderModel.from_pretrained("sappho192/jesc-ja-en-translator")


def translate(text_src):
    embeddings = src_tokenizer(text_src, return_attention_mask=False, return_token_type_ids=False, return_tensors='pt')
    embeddings = {k: v for k, v in embeddings.items()}
    output = model.generate(**embeddings, max_length=512)[0, 1:-1]
    text_trg = trg_tokenizer.decode(output.cpu())
    return text_trg

texts = [
    "逃げろ!",  # Should be "run!"
    "εˆγ‚γΎγ—γ¦.",  # "nice to meet you."
    "γ‚ˆγ‚γ—γγŠι‘˜γ„γ—γΎγ™.",  # "thank you."
    "倜にγͺγ‚ŠγΎγ—γŸ",  # "and then it got dark."
    "γ”ι£―γ‚’ι£ŸγΉγΎγ—γ‚‡γ†."  # "let's eat."
 ]

for text in texts:
    print(translate(text))
    print()

Dataset

The dataset used to train the model is JESC(Japanese-English Subtitle Corpus).
Its license is CC-BY-SA-4.0. All data information can be accessed through following links:

@ARTICLE{pryzant_jesc_2017,
   author = {{Pryzant}, R. and {Chung}, Y. and {Jurafsky}, D. and {Britz}, D.},
    title = "{JESC: Japanese-English Subtitle Corpus}",
  journal = {ArXiv e-prints},
archivePrefix = "arXiv",
   eprint = {1710.10639},
 keywords = {Computer Science - Computation and Language},
     year = 2017,
    month = oct,
}
Downloads last month
577
Safetensors
Model size
289M params
Tensor type
I64
Β·
F32
Β·
BOOL
Β·
Inference Examples
Inference API (serverless) has been turned off for this model.

Space using sappho192/jesc-ja-en-translator 1