InternViT-6B + QLLaMA, can be used for image-text retrieval like CLIP
#5
by
vitvit
- opened
Can you provide an example? (using text and image)
Hi, please see the quick start section in the model card.
https://huggingface.co/OpenGVLab/InternVL-14B-224px#quick-start
It is not clear. It specifies how to load image encoder but not the fext encoder