How this model count the token size?
#10
by
WeiZhenKun
- opened
How this model count the token size?
Is there a certain proportional relationship between the token size and the length of characters?
This model is based on the BERT tokenizer, as an approximate rule of thumb, there are roughly 0.75 words per token in English text. For precise count, please load the tokenizer and run on your data of interest.
intfloat
changed discussion status to
closed