google/codegemma-7b-it · Fix: add <eos> token at the end of chat template

Oct 17, 2024

When we try to generate training data for Instruction Fine-Tuning for the code gemma instruct model using the "apply_chat_template" function, the jinja template doesn't add token at the end of generation. This results in the model not-learning/unlearning the concept of end_of_sentence tokens.

The ideal behavior of chat template is to generate templates for training if "add_generation_prompt" = False and "continue_final_message" = False. But that's not the case here. This new tokenizer_config.json file fixes that problem.

Upload tokenizer_config.jsone0a885ce

adhi29 changed pull request title from Upload tokenizer_config.json to Fix: add <eos> token at the end of chat template Oct 17, 2024