EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning Paper • 2401.17690 • Published Jan 31, 2024 • 5