Could you publish results compared to Sonnet 3.5?

#3
by JustJaro - opened

From personal tests it (sonnet 3.5) is the SoTA model for describing figures, graphs, etc.

With performing non-profit work (and Anthropic not yet having a consistent way to assist such non-profits in terms of API credits outside of some competitions which with limited developer time we don't engage in) for children with varying degrees of learning difficulties (eg: diagrams, blackboard to text), it would be extremely useful to know if your model would perform better!

OpenGVLab org

Thanks for your advice, I will add sonnet 3.5 in the comparsion table.

OpenGVLab org

I added sonnet 3.5 to the comparison table for the 76B model, see here:

https://huggingface.co/OpenGVLab/InternVL2-Llama3-76B#image-benchmarks

czczup changed discussion status to closed

Thank you! Great work on the model btw.

Bit off topic; whats your stance on models like VLM2Vec compared to VLMs? In an AI System it might make sense to route more complex queries to chunked visual embeddings; but use a VLM as a judge with guided_choice. Whats your take on the matter?

Sign up or log in to comment