Unify Tokenizers for Gemma-2-2B-it and Gemma-2-2B-jpn-it
Currently, llama.cpp server requires the main model and the draft model to have completely identical tokenizers for speculative decoding to work. However, there is a discrepancy between the tokenizers of gemma-2-2b-it and gemma-2-2b-jpn-it. Specifically, token ID 255999 is <start_of_image>
in gemma-2-2b-jpn-it but <unused99>
in gemma-2-2b-it. This difference causes an error in the llama.cpp server, preventing the use of gemma-2-2b-jpn-it as a draft model.
Through experiments, I have found that gemma-2-2b-jpn-it is well-suited as a draft model for Japanese text generation, providing a speedup of approximately 1.35x on my CPU-only desktop. However, the tokenizer mismatch hinders its practical application.
Considering that gemma-2-2b-jpn-it is designed for text generation and does not handle images, the <start_of_image>
token seems unnecessary. Therefore, I propose unifying the tokenizers by replacing <start_of_image>
(ID 255999) in gemma-2-2b-jpn-it's tokenizer with <unused99>
, aligning it with gemma-2-2b-it.
I have tested the modified tokenizer and confirmed its operability with the llama.cpp server. I am attaching the modified tokenizer file for your review.
Thank you for your consideration.