I'm confusing if this Chat model use the standard CHATML template? Is the bos_token <|im_start|> or <|startoftext|>? Is the eos_token <|im_end|> or <|endoftext|>?
Yi-1.5-34B-Chat-16K/config.json is not consistent with Yi-1.5-34B-Chat-16K/tokenizer_config.json.
Is the bos_token <|im_start|> or <|startoftext|>? Is the eos_token <|im_end|> or <|endoftext|>?
As shown in Yi-1.5-34B-Chat-16K/config.json:
"bos_token_id": 1,
"eos_token_id": 2,
As shown in Yi-1.5-34B-Chat-16K/tokenizer_config.json:
"bos_token": "<|startoftext|>",
"eos_token": "<|im_end|>",
"1": {
"content": "<|startoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"2": {
"content": "<|endoftext|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
},
"7": {
"content": "<|im_end|>",
"lstrip": false,
"normalized": false,
"rstrip": false,
"single_word": false,
"special": true
}
My apologies for the late reply, we've updated the tokenizer.json, did that resolve your issue?
thanks!