Engage in multimedia chat with LLMs and ML models
Generate images from text prompts
Transcribe audio or YouTube videos into text