How to use this model for tokenization?
#1
by
tprochenka
- opened
Hi I tried to do tokenization:tokenizer = LongformerTokenizer.from_pretrained("sdadas/polish-longformer-base-4096")
I got an error that vocab_file is not found. Indeed, I see that there is no vocab.json, instead I see tokanizer.json. Could you please share a snippet showing how to do tokenization using your model?
Thanks!
Hi, the model supports fast tokenizer format only. Use LongformerTokenizerFast
instead of LongformerTokenizer
:
from transformers import LongformerTokenizerFast
tokenizer = LongformerTokenizerFast.from_pretrained("sdadas/polish-longformer-base-4096")
encoded = tokenizer("Za偶贸艂ci膰 g臋艣l膮 ja藕艅.")
print(encoded.input_ids)
Thanks for a quick answer, it works :)
tprochenka
changed discussion status to
closed