Audio processing, Automatic Speech Recognition, Multilingual Translation, Vision-Language Models, Generative Dialogue System