Prompt length vs generation? Voice becomes scary at 200 tokens/charactes

#3
by fsaudm - opened

Any similar experiences? The voice quality degrades significantly when the text sequence passed increase. Some other times, it would just not do the entire sequence, and cut it at random places.

Yes, even after 100 tokens. I solved the error by generating voice sentence by sentence and combining them finally: https://huggingface.co/spaces/emirhanbilgic/read-my-pdf-outloud/blob/main/app.py
for the voice consistency, you can use the names (Gary, Jon, etc.)

Nice! Yeah I thought about doing something along those lines, I will definitely check it out :D

@fsaudm
I used chunking and batch generation for longer sentences.
https://github.com/slabstech/llm-recipes/blob/main/python/notebooklm/audiobook/utils/batch_inference_chunked.py

Note: chunking degrade quality, due to context loss.
I am trying to filter the sentence via LLM to produce short coherent sentences to maintain quality.

Sign up or log in to comment