en to zh not work

#14
by xiaoyaolangzi - opened

tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-zh")
model = AutoModelWithLMHead.from_pretrained("Helsinki-NLP/opus-mt-en-zh")
translation = pipeline("translation_en_to_zh", model=model, tokenizer=tokenizer)
#translation = pipeline("translation", model="Helsinki-NLP/opus-mt-en-zh")

text = "hello"
result = translation(text, max_length=40)[0]["translation_text"]
result is ε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆ

transformers 4.31.0

Check this link: https://huggingface.co/docs/transformers/model_doc/marian

from transformers import MarianMTModel, MarianTokenizer

src_text = [
'Hello, Good to see you.',
 "It's a beautiful day!", 
'Good moods are the most important.',
]

model_name = "Helsinki-NLP/opus-mt-en-zh"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))
res = [tokenizer.decode(t, skip_special_tokens=True) for t in translated]
print(res)

the result is:

['δ½ ε₯½,εΎˆι«˜ε…΄θ§εˆ°δ½ γ€‚', 'θΏ™ζ˜―δΈ€δΈͺηΎŽδΈ½ηš„δΈ€ε€©!', 'θ‰―ε₯½ηš„ζƒ…η»ͺζ˜―ζœ€ι‡θ¦ηš„γ€‚']

tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-zh")
model = AutoModelWithLMHead.from_pretrained("Helsinki-NLP/opus-mt-en-zh")
translation = pipeline("translation_en_to_zh", model=model, tokenizer=tokenizer)
#translation = pipeline("translation", model="Helsinki-NLP/opus-mt-en-zh")

text = "hello"
result = translation(text, max_length=40)[0]["translation_text"]
result is ε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆε“ˆ

transformers 4.31.0

I have also encountered this problem Have you solved it?

Sign up or log in to comment