vllm <SpeechHere> token id

#3
by jitanus - opened

It seems in the prompt_token_ids the index for "SpeechHere" is 256000 and we get this error
AssertionError: The text input contains 0 audio tokens, but 1 audios provided
since speech_token_index is expected to be 255999.

MERaLiON org
edited Jan 4

Hi, Can you provide your inference code?

Please make sure the tokenizer_mode argument is set to slow. For now, we simply modified Gemma's tokenizer_config.json and changed its token 255999 from <unused99> to <SpeechHere>, but huggingface fast tokenizer can't recognize it.

Thanks for your prompt reply. That was it! I missed setting tokenizer_mode to "slow".

MERaLiON org

no problem :)

jitanus changed discussion status to closed

Sign up or log in to comment