lixinhao commited on
Commit
fc5bc4e
·
verified ·
1 Parent(s): bb71951

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -70,7 +70,7 @@ model-index:
70
  [\[📜 Tech Report\]](https://arxiv.org/abs/2501.12386)
71
  <!-- [\[🗨️ Chat Demo\]](https://huggingface.co/spaces/OpenGVLab/VideoChat-Flash) -->
72
 
73
- InternVideo2.5 is a video multimodal large language model (MLLM, built upoon InternVL2.5) enhanced with **long and rich context (LRC) modeling**. It significantly improves upon existing MLLMs by enhancing their ability to perceive fine-grained details and capture long-form temporal structures. We achieve this through dense vision task annotations using direct preference optimization (TPO) and compact spatiotemporal representations via adaptive hierarchical token compression (HiCo). This model is a variant of InternVideo2.5's ablation experiment, built on HiCo technology only (R16 means 16 tokens per frame).
74
 
75
 
76
 
 
70
  [\[📜 Tech Report\]](https://arxiv.org/abs/2501.12386)
71
  <!-- [\[🗨️ Chat Demo\]](https://huggingface.co/spaces/OpenGVLab/VideoChat-Flash) -->
72
 
73
+ InternVideo2.5 is a video multimodal large language model (MLLM, built upoon InternVL2.5) enhanced with **long and rich context (LRC) modeling**. It significantly improves upon existing MLLMs by enhancing their ability to perceive fine-grained details and capture long-form temporal structures. We achieve this through dense vision task annotations using direct preference optimization (TPO) and compact spatiotemporal representations via adaptive hierarchical token compression (HiCo). This model is a variant of InternVideo2.5's ablation experiment, built on HiCo technology only (**R16 means 16 tokens per frame**).
74
 
75
 
76