vllm <SpeechHere> token id

by jitanus - opened Jan 3

Jan 3

•

It seems in the prompt_token_ids the index for "SpeechHere" is 256000 and we get this error
AssertionError: The text input contains 0 audio tokens, but 1 audios provided
since speech_token_index is expected to be 255999.

YingxuHe

MERaLiON org Jan 4

•

edited Jan 4

Hi, Can you provide your inference code?

Please make sure the tokenizer_mode argument is set to slow. For now, we simply modified Gemma's tokenizer_config.json and changed its token 255999 from <unused99> to <SpeechHere>, but huggingface fast tokenizer can't recognize it.

jitanus

Jan 4

Thanks for your prompt reply. That was it! I missed setting tokenizer_mode to "slow".

YingxuHe

MERaLiON org Jan 4

no problem :)

jitanus changed discussion status to closed Jan 4

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment