this model doesn't work
#4
by
lawless-m
- opened
even on the example page
My name is Sarah and I live in London
comes out as
Λέ με λένε Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά και μέ μέ μέ μέ
Indeed, slipped through the cracks it seems! Will push something
Hey
@lawless-m
, sorry for the delay, but the model does work! See below:
As written in the README:
a sentence initial language token is required in the form of >>id<< (id = valid target language ID)
You can get the IDs supported by all HelsinkiNLP models with:
>>> tokenizer = MarianTokenizer.from_pretrained(model_name)
>>> print(tokenizer.supported_language_codes)
['>>ell<<']
I tested it on newer versions of transformers
as well, and it works well! See the following snippet:
from transformers import MarianMTModel, MarianTokenizer
src_text = [
">>ell<< Yesterday was my birthday"
]
model_name = "Helsinki-NLP/opus-mt-en-grk"
tokenizer = MarianTokenizer.from_pretrained(model_name)
print(tokenizer.supported_language_codes)
model = MarianMTModel.from_pretrained(model_name)
translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))
print([tokenizer.decode(t, skip_special_tokens=True) for t in translated])