DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment
Paper
•
2507.02768
•
Published
•
3
Speech Processing, Self-Supervised Learning, ASR, TTS, Voice Conversion, Spoken Question Answering