ShoukanLabs/Vokan · Exploring Fine-Tuning and Persona Creation in StyleTTS2

Hi, great work on curating the dataset and fine-tuning the StyleTTS2 model! I’d love to get your insights on a few things, as you guys are experienced with this and subject matter experts:

Would fine-tuning the model for communication and call center conversations help make it more conversational in style?
Can nonverbal cues like pauses, “um/uh” filler sounds, or breaths be incorporated to make the output feel more natural?
Is it possible to create distinct personas, like Harry, the articulate and confident, or Sarah, the charming and fluent? (Feel free to suggest some creative persona ideas!)

Looking forward to your thoughts!