mmaaz60 commited on
Commit
4cdb8c7
β€’
1 Parent(s): 8312df0

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -0
README.md ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ # πŸ‘οΈ VideoGPT+ (Phi-3-mini-4K 3.8B - Projector Pretrain Weights)
6
+
7
+ ---
8
+ ## πŸ“ Description
9
+ VideoGPT+ integrates image and video encoders to leverage detailed spatial understanding and global temporal context, respectively. It processes videos in segments using adaptive pooling on features from both encoders, enhancing performance across various video benchmarks.
10
+
11
+ **This model contains the pretrained weights of projectors for Image encoder (CLIP L/14) and Video Encoder (InternVideo2).**
12
+
13
+ ## πŸ’» Download
14
+ To get started with GLaMM-FullScope, follow these steps:
15
+ ```
16
+ git lfs install
17
+ git clone https://huggingface.co/MBZUAI/VideoGPT-plus_Phi3-mini-4k_Pretrain
18
+ ```
19
+
20
+ ## πŸ“š Additional Resources
21
+ - **Paper:** [ArXiv](https://arxiv.org/abs/2406.09418).
22
+ - **GitHub Repository:** For training and updates: [GitHub - GLaMM](https://github.com/mbzuai-oryx/VideoGPT-plus).
23
+ - **HuggingFace Collection:** For downloading the pretrained checkpoints, VCGBench-Diverse Benchmarks and Training data, visit [HuggingFace Collection - VideoGPT+](https://huggingface.co/collections/MBZUAI/videogpt-665c8643221dda4987a67d8d).
24
+
25
+ ## πŸ“œ Citations and Acknowledgments
26
+
27
+ ```bibtex
28
+ @article{Maaz2024VideoGPT+,
29
+ title={VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding},
30
+ author={Maaz, Muhammad and Rasheed, Hanoona and Khan, Salman and Khan, Fahad Shahbaz},
31
+ journal={arxiv},
32
+ year={2024},
33
+ url={https://arxiv.org/abs/2406.09418}
34
+ }