OpenGVLab
/

InternVideo2-Stage2_6B

Video Classification

Model card Files Files and versions Community

shepnerd commited on Feb 11

Commit

c9bac54

·

verified ·

1 Parent(s): eb19c00

Update README.md

Files changed (1) hide show

README.md +36 -6

README.md CHANGED Viewed

@@ -1,11 +1,41 @@
 ---
 license: mit
 pipeline_tag: video-classification
-tags:
-- model_hub_mixin
-- pytorch_model_hub_mixin
 ---
-This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
-- Library: [More Information Needed]
-- Docs: [More Information Needed]

 ---
 license: mit
 pipeline_tag: video-classification
 ---
+## Introduction
+This repository contains the 6B model of the paper [InternVideo2](https://arxiv.org/pdf/2403.15377) in stage 2.
+Code: https://github.com/OpenGVLab/InternVideo/tree/main/InternVideo2/multi_modality
+## 🚀 Installation
+Please refer to https://github.com/OpenGVLab/InternVideo/blob/main/InternVideo2/multi_modality/INSTALL.md
+## Usage
+```python
+import cv2
+from transformers import AutoModel
+from modeling_internvideo2 import (retrieve_text, vid2tensor, _frame_from_video,)
+if __name__ == '__main__':
+    model = AutoModel.from_pretrained("OpenGVLab/InternVideo2-Stage2_6B", trust_remote_code=True).eval()
+    video = cv2.VideoCapture('example1.mp4')
+    frames = [x for x in _frame_from_video(video)]
+    text_candidates = ["A playful dog and its owner wrestle in the snowy yard, chasing each other with joyous abandon.",
+                    "A man in a gray coat walks through the snowy landscape, pulling a sleigh loaded with toys.",
+                    "A person dressed in a blue jacket shovels the snow-covered pavement outside their house.",
+                    "A cat excitedly runs through the yard, chasing a rabbit.",
+                    "A person bundled up in a blanket walks through the snowy landscape, enjoying the serene winter scenery."]
+    texts, probs = retrieve_text(frames, text_candidates, model=model, topk=5)
+    for t, p in zip(texts, probs):
+        print(f'text: {t} ~ prob: {p:.4f}')
+    vidtensor = vid2tensor('example1.mp4', fnum=4)
+    feat = model.get_vid_feat(vidtensor)
+```