Prompt length vs generation? Voice becomes scary at 200 tokens/charactes

by fsaudm - opened Aug 13, 2024

Aug 13, 2024

Any similar experiences? The voice quality degrades significantly when the text sequence passed increase. Some other times, it would just not do the entire sequence, and cut it at random places.

emirhanbilgic

Aug 13, 2024

•

edited Aug 28, 2024

Yes, even after 100 tokens. I solved the error by generating voice sentence by sentence and combining them finally: https://huggingface.co/spaces/emirhanbilgic/read-my-pdf-outloud/blob/main/app.py
for the voice consistency, you can use the names (Gary, Jon, etc.)

fsaudm

Aug 14, 2024

Nice! Yeah I thought about doing something along those lines, I will definitely check it out :D

gaganyatri

29 days ago

@fsaudm
I used chunking and batch generation for longer sentences.
https://github.com/slabstech/llm-recipes/blob/main/python/notebooklm/audiobook/utils/batch_inference_chunked.py

Note: chunking degrade quality, due to context loss.
I am trying to filter the sentence via LLM to produce short coherent sentences to maintain quality.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment