Spaces:
Running
Running
Add vision capabilities.
#443
by
FlipTip
- opened
Vision capabilities would be great and make the free HuggingChat event better than the free Chatgpt.
One way to implement this would be to add idefics2-8b or Mantis-8B-siglip-llama3 or get image descriptions using moondream2. A model like Mistral-7b (or any other) could be used to create the prompt for moondream2 based on the user's question, then the user's prompt and a description of the image could be sent to the model the user wants to chat with.