Detect objects in images using text prompts
A Foundation Action Model For Generalist GUI Agents
Convert HTML to Markdown
Generate answers to questions about images
Ask questions about images
Generate detailed descriptions from images and questions