OpenGVLab
/

InternViT-6B-448px-V1-2

Image Feature Extraction

feature-extraction

Model card Files Files and versions Community

czczup commited on Apr 26, 2024

Commit

dd74d28

·

verified ·

1 Parent(s): b6b9784

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -15,7 +15,7 @@ pipeline_tag: image-feature-extraction
   <img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/re658pVjHaJEnJerlmRco.webp" alt="Image Description" width="300" height="300">
 </p>
-\[[Paper](https://arxiv.org/abs/2312.14238)\]  \[[GitHub](https://github.com/OpenGVLab/InternVL)\] \[[Chat Demo](https://internvl.opengvlab.com/)\] \[[中文解读](https://zhuanlan.zhihu.com/p/675877376)]
 We release our new InternViT weights as InternViT-6B-448px-V1-2. The continuous pre-training of the InternViT-6B model is involved in the [InternVL 1.2](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2) update. Specifically, we increased the resolution of InternViT-6B from 224 to 448 and integrated it with [Nous-Hermes-2-Yi-34B]((https://huggingface.co/NousResearch/Nous-Hermes-2-Yi-34B).
 To equip the model with high-resolution processing and OCR capabilities, both the vision encoder and the MLP were activated for training, utilizing a mix of image captioning and OCR-specific datasets.

   <img src="https://cdn-uploads.huggingface.co/production/uploads/64119264f0f81eb569e0d569/re658pVjHaJEnJerlmRco.webp" alt="Image Description" width="300" height="300">
 </p>
+\[[InternVL 1.5 Technical Report](https://arxiv.org/abs/2404.16821)\]  \[[Paper](https://arxiv.org/abs/2312.14238)\]  \[[GitHub](https://github.com/OpenGVLab/InternVL)\] \[[Chat Demo](https://internvl.opengvlab.com/)\] \[[中文解读](https://zhuanlan.zhihu.com/p/675877376)]
 We release our new InternViT weights as InternViT-6B-448px-V1-2. The continuous pre-training of the InternViT-6B model is involved in the [InternVL 1.2](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2) update. Specifically, we increased the resolution of InternViT-6B from 224 to 448 and integrated it with [Nous-Hermes-2-Yi-34B]((https://huggingface.co/NousResearch/Nous-Hermes-2-Yi-34B).
 To equip the model with high-resolution processing and OCR capabilities, both the vision encoder and the MLP were activated for training, utilizing a mix of image captioning and OCR-specific datasets.