Regarding the vocabulary used in the paper
#11
by
jiaxin-wen
- opened
Thanks for your great work!
I have a question regarding the vocabulary. Specifically, the paper mentions that "We use GPT-Neo tokenizer but only keep the top 10K most common tokens". However, the current uploaded vocabulary consists of 50K tokens. Would you please update the vocabulary that can be used to reproduce your experiments:)