It's not prompted. The source Audio had that emotional context and the model simply copied it.
Srinivas Billa
srinivasbilla
AI & ML interests
None yet
Recent Activity
new activity
about 17 hours ago
srinivasbilla/llasa-8b-tts:Apply for community grant: Personal project (gpu)
new activity
about 17 hours ago
srinivasbilla/llasa-3b-tts:Apply for community grant: Personal project (gpu)
commented on
their
article
4 days ago
The SOTA Text-to-speech and Zero Shot Voice cloning model that no one knows about...
Organizations
None yet
srinivasbilla's activity
Apply for community grant: Personal project (gpu)
1
#2 opened about 17 hours ago
by
srinivasbilla
Apply for community grant: Personal project (gpu)
1
#7 opened about 17 hours ago
by
srinivasbilla
commented on
The SOTA Text-to-speech and Zero Shot Voice cloning model that no one knows about...
4 days ago
Increase duration for the 8B model
#1 opened about 2 months ago
by
multimodalart

Please apply for a community GPU grant
2
#6 opened about 1 month ago
by
Pendrokar

commented on
The SOTA Text-to-speech and Zero Shot Voice cloning model that no one knows about...
about 2 months ago
Yes! Thanks for letting me know
commented on
The SOTA Text-to-speech and Zero Shot Voice cloning model that no one knows about...
2 months ago
around 10gb, and around 300 chars is the sweet spot. you can chunk text and do it though
Librarian Bot: Add language metadata for dataset
#1 opened 2 months ago
by
librarian-bot

Can we run this locally via docker?
4
#4 opened 2 months ago
by
pylotlight
Emotions
2
#3 opened 2 months ago
by
jujutechnology
Fixes 500 error for some users
#2 opened 2 months ago
by
Tonic

commented on
The SOTA Text-to-speech and Zero Shot Voice cloning model that no one knows about...
2 months ago
I had a look at both, it seems doable. Ill try follow the repeng example. But its a bit confusing how they generate the dataset