Could you publish results compared to Sonnet 3.5?

by JustJaro - opened Jul 12, 2024

Jul 12, 2024

From personal tests it (sonnet 3.5) is the SoTA model for describing figures, graphs, etc.

With performing non-profit work (and Anthropic not yet having a consistent way to assist such non-profits in terms of API credits outside of some competitions which with limited developer time we don't engage in) for children with varying degrees of learning difficulties (eg: diagrams, blackboard to text), it would be extremely useful to know if your model would perform better!

czczup

OpenGVLab org Jul 12, 2024

Thanks for your advice, I will add sonnet 3.5 in the comparsion table.

czczup

OpenGVLab org Jul 16, 2024

I added sonnet 3.5 to the comparison table for the 76B model, see here:

https://huggingface.co/OpenGVLab/InternVL2-Llama3-76B#image-benchmarks

czczup changed discussion status to closed Jul 16, 2024

JustJaro

Nov 10, 2024

Thank you! Great work on the model btw.

Bit off topic; whats your stance on models like VLM2Vec compared to VLMs? In an AI System it might make sense to route more complex queries to chunked visual embeddings; but use a VLM as a judge with guided_choice. Whats your take on the matter?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment