this model doesn't work

#4
by lawless-m - opened

even on the example page

My name is Sarah and I live in London

comes out as

Λέ με λένε Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά Σά και μέ μέ μέ μέ

Language Technology Research Group at the University of Helsinki org

Indeed, slipped through the cracks it seems! Will push something

Hey @lawless-m , sorry for the delay, but the model does work! See below:
image.png

As written in the README:

a sentence initial language token is required in the form of >>id<< (id = valid target language ID)

You can get the IDs supported by all HelsinkiNLP models with:

>>> tokenizer = MarianTokenizer.from_pretrained(model_name)
>>> print(tokenizer.supported_language_codes)
['>>ell<<']

I tested it on newer versions of transformers as well, and it works well! See the following snippet:

from transformers import MarianMTModel, MarianTokenizer

src_text = [
    ">>ell<< Yesterday was my birthday"
]

model_name = "Helsinki-NLP/opus-mt-en-grk"
tokenizer = MarianTokenizer.from_pretrained(model_name)
print(tokenizer.supported_language_codes)

model = MarianMTModel.from_pretrained(model_name)
translated = model.generate(**tokenizer(src_text, return_tensors="pt", padding=True))
print([tokenizer.decode(t, skip_special_tokens=True) for t in translated])

Sign up or log in to comment