Does Qwen 2.5 support Thai language?

#4
by Suppadate - opened

from transformers import AutoTokenizer
Here is my code
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct")
print(tokenizer.tokenize("สวัสดี"))

Output: ['สว', 'ั', 'สà¸Ķ', 'ี']

Can you suggest how to fix that or where I can find vocab.json, tokenizer.json, etc.?

Suppadate changed discussion title from Does Qwen 2.5 support That language? to Does Qwen 2.5 support Thai language?
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment