Large-scale Pre-training for Grounded Video Caption Generation Paper β’ 2503.10781 β’ Published 13 days ago β’ 16
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio Paper β’ 2303.00747 β’ Published Mar 1, 2023 β’ 5
The VoxCeleb Speaker Recognition Challenge: A Retrospective Paper β’ 2408.14886 β’ Published Aug 27, 2024 β’ 10