OpenGVLab
/

InternViT-6B-448px-V1-0

Image Feature Extraction

feature-extraction

Model card Files Files and versions Community

czczup commited on Apr 21, 2024

Commit

e4a2d50

·

verified ·

1 Parent(s): a16e917

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ pipeline_tag: image-feature-extraction
 # Model Card for InternViT-6B-448px-V1-0
-<img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/AUE-3OBtfr9vDA7Elgkhd.webp" alt="Image Description" width="300" height="300">
 \[[Paper](https://arxiv.org/abs/2312.14238)\]  \[[GitHub](https://github.com/OpenGVLab/InternVL)\] \[[Chat Demo](https://internvl.opengvlab.com/)\] \[[中文解读](https://zhuanlan.zhihu.com/p/675877376)]
@@ -30,7 +30,7 @@ pipeline_tag: image-feature-extraction
   - Params (M): 5903
   - Image size: 448 x 448
 - **Pretrain Dataset:** LAION-en, LAION-COCO, COYO, CC12M, CC3M, SBU, Wukong, LAION-multi, OCR-related datasets.
-- **Note:** This model has 48 blocks, and we found that using the output after the fourth-to-last block worked best for VLLM. Therefore, **please set mm_vision_select_layer=-4 when using this model to build VLLM.**
 ## Model Usage (Image Embeddings)

 # Model Card for InternViT-6B-448px-V1-0
+<img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/s0wjRQcYFdcQZa2FZ3Om7.webp" alt="Image Description" width="300" height="300">
 \[[Paper](https://arxiv.org/abs/2312.14238)\]  \[[GitHub](https://github.com/OpenGVLab/InternVL)\] \[[Chat Demo](https://internvl.opengvlab.com/)\] \[[中文解读](https://zhuanlan.zhihu.com/p/675877376)]
   - Params (M): 5903
   - Image size: 448 x 448
 - **Pretrain Dataset:** LAION-en, LAION-COCO, COYO, CC12M, CC3M, SBU, Wukong, LAION-multi, OCR-related datasets.
+- **Note:** This model has 48 blocks, and we found that using the output after the fourth-to-last block worked best for VLLM. Therefore, when building a VLLM with this model, **please use the features from the fourth-to-last layer.**
 ## Model Usage (Image Embeddings)