archit11
/

videomae-base-finetuned-ucfcrime-full

Video Classification

Generated from Trainer

vandalism-dectection

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

archit11 commited on Mar 18, 2024

Commit

883ce2d

·

verified ·

1 Parent(s): 66c91bd

Update README.md

Files changed (1) hide show

README.md +75 -2

README.md CHANGED Viewed

@@ -7,6 +7,7 @@ tags:
 - video-classification
 - ucf-crime
 - vandalism-dectection
 metrics:
 - accuracy
 model-index:
@@ -30,8 +31,80 @@ More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
 More information needed

 - video-classification
 - ucf-crime
 - vandalism-dectection
+- videomae
 metrics:
 - accuracy
 model-index:
 ## Intended uses & limitations
+Usage:
+```
+import av
+import torch
+import numpy as np
+from transformers import AutoImageProcessor, VideoMAEForVideoClassification
+from huggingface_hub import hf_hub_download
+np.random.seed(0)
+def read_video_pyav(container, indices):
+    '''
+    Decode the video with PyAV decoder.
+    Args:
+        container (`av.container.input.InputContainer`): PyAV container.
+        indices (`List[int]`): List of frame indices to decode.
+    Returns:
+        result (np.ndarray): np array of decoded frames of shape (num_frames, height, width, 3).
+    '''
+    frames = []
+    container.seek(0)
+    start_index = indices[0]
+    end_index = indices[-1]
+    for i, frame in enumerate(container.decode(video=0)):
+        if i > end_index:
+            break
+        if i >= start_index and i in indices:
+            frames.append(frame)
+    return np.stack([x.to_ndarray(format="rgb24") for x in frames])
+def sample_frame_indices(clip_len, frame_sample_rate, seg_len):
+    '''
+    Sample a given number of frame indices from the video.
+    Args:
+        clip_len (`int`): Total number of frames to sample.
+        frame_sample_rate (`int`): Sample every n-th frame.
+        seg_len (`int`): Maximum allowed index of sample's last frame.
+    Returns:
+        indices (`List[int]`): List of sampled frame indices
+    '''
+    converted_len = int(clip_len * frame_sample_rate)
+    end_idx = np.random.randint(converted_len, seg_len)
+    start_idx = end_idx - converted_len
+    indices = np.linspace(start_idx, end_idx, num=clip_len)
+    indices = np.clip(indices, start_idx, end_idx - 1).astype(np.int64)
+    return indices
+# video clip consists of 300 frames (10 seconds at 30 FPS)
+file_path = hf_hub_download(
+    repo_id="nielsr/video-demo", filename="eating_spaghetti.mp4", repo_type="dataset"
+)
+container = av.open(file_path)
+# sample 16 frames
+indices = sample_frame_indices(clip_len=16, frame_sample_rate=1, seg_len=container.streams.video[0].frames)
+video = read_video_pyav(container, indices)
+image_processor = AutoImageProcessor.from_pretrained("videomae-base-finetuned-ucfcrime-full")
+model = VideoMAEForVideoClassification.from_pretrained("videomae-base-finetuned-ucfcrime-full")
+inputs = image_processor(list(video), return_tensors="pt")
+with torch.no_grad():
+    outputs = model(**inputs)
+    logits = outputs.logits
+# model predicts one of the 400 Kinetics-400 classes
+predicted_label = logits.argmax(-1).item()
+print(model.config.id2label[predicted_label])
+```
 ## Training and evaluation data
 More information needed