OpenGVLab
/

InternViT-6B-448px-V1-0

Image Feature Extraction

feature-extraction

Model card Files Files and versions Community

czczup commited on Feb 23, 2024

Commit

3f82539

·

verified ·

1 Parent(s): d71e340

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -26,7 +26,7 @@ It is _**the largest open-source vision/vision-language foundation model (14B)**
 - **Model Stats:**
   - Params (M): 5903
   - Image size: 448 x 448
-- **Pretrain Dataset:** LAION-en, LAION-COCO, COYO, CC12M, CC3M, SBU, Wukong, LAION-multi
 - **Note:** This model has 48 blocks, and we found that using the output after the fourth-to-last block worked best for VLLM. Therefore, **please set mm_vision_select_layer=-4 when using this model to build VLLM.**
 ## Model Usage (Image Embeddings)

 - **Model Stats:**
   - Params (M): 5903
   - Image size: 448 x 448
+- **Pretrain Dataset:** LAION-en, LAION-COCO, COYO, CC12M, CC3M, SBU, Wukong, LAION-multi, OCR data
 - **Note:** This model has 48 blocks, and we found that using the output after the fourth-to-last block worked best for VLLM. Therefore, **please set mm_vision_select_layer=-4 when using this model to build VLLM.**
 ## Model Usage (Image Embeddings)