Update README.md
Browse files
README.md
CHANGED
@@ -70,7 +70,7 @@ model-index:
|
|
70 |
[\[📜 Tech Report\]](https://arxiv.org/abs/2501.12386)
|
71 |
<!-- [\[🗨️ Chat Demo\]](https://huggingface.co/spaces/OpenGVLab/VideoChat-Flash) -->
|
72 |
|
73 |
-
InternVideo2.5 is a video multimodal large language model (MLLM, built upoon InternVL2.5) enhanced with **long and rich context (LRC) modeling**. It significantly improves upon existing MLLMs by enhancing their ability to perceive fine-grained details and capture long-form temporal structures. We achieve this through dense vision task annotations using direct preference optimization (TPO) and compact spatiotemporal representations via adaptive hierarchical token compression (HiCo). This model is a variant of InternVideo2.5's ablation experiment, built on HiCo technology only (R16 means 16 tokens per frame).
|
74 |
|
75 |
|
76 |
|
|
|
70 |
[\[📜 Tech Report\]](https://arxiv.org/abs/2501.12386)
|
71 |
<!-- [\[🗨️ Chat Demo\]](https://huggingface.co/spaces/OpenGVLab/VideoChat-Flash) -->
|
72 |
|
73 |
+
InternVideo2.5 is a video multimodal large language model (MLLM, built upoon InternVL2.5) enhanced with **long and rich context (LRC) modeling**. It significantly improves upon existing MLLMs by enhancing their ability to perceive fine-grained details and capture long-form temporal structures. We achieve this through dense vision task annotations using direct preference optimization (TPO) and compact spatiotemporal representations via adaptive hierarchical token compression (HiCo). This model is a variant of InternVideo2.5's ablation experiment, built on HiCo technology only (**R16 means 16 tokens per frame**).
|
74 |
|
75 |
|
76 |
|