google/gemma-2-2b-jpn-it · Unify Tokenizers for Gemma-2-2B-it and Gemma-2-2B-jpn-it

Dec 4, 2024

•

edited Dec 4, 2024

Currently, llama.cpp server requires the main model and the draft model to have completely identical tokenizers for speculative decoding to work. However, there is a discrepancy between the tokenizers of gemma-2-2b-it and gemma-2-2b-jpn-it. Specifically, token ID 255999 is <start_of_image> in gemma-2-2b-jpn-it but <unused99> in gemma-2-2b-it. This difference causes an error in the llama.cpp server, preventing the use of gemma-2-2b-jpn-it as a draft model.

Through experiments, I have found that gemma-2-2b-jpn-it is well-suited as a draft model for Japanese text generation, providing a speedup of approximately 1.35x on my CPU-only desktop. However, the tokenizer mismatch hinders its practical application.

Considering that gemma-2-2b-jpn-it is designed for text generation and does not handle images, the <start_of_image> token seems unnecessary. Therefore, I propose unifying the tokenizers by replacing <start_of_image> (ID 255999) in gemma-2-2b-jpn-it's tokenizer with <unused99>, aligning it with gemma-2-2b-it.

I have tested the modified tokenizer and confirmed its operability with the llama.cpp server. I am attaching the modified tokenizer file for your review.

Thank you for your consideration.

Upload tokenizer_config.json0dcc931f

grapevine-AI changed pull request title from Upload tokenizer_config.json to Unify Tokenizers for Gemma-2-2B-it and Gemma-2-2B-jpn-it Dec 4, 2024