Spaces:

fffiloni
/

soft-video-understanding

Paused

fffiloni commited on Mar 5, 2024

Commit

9c159b5

verified ·

1 Parent(s): 4208876

Update app.py

Files changed (1) hide show

app.py CHANGED Viewed

@@ -19,9 +19,10 @@ pipe = pipeline("text-generation", model=zephyr_model, torch_dtype=torch.bfloat1
 standard_sys = f"""
 You will be provided a list of visual events, and an audio description. All these informations come from a single video.
 List of visual events are actually images extracted from this video every 12 frames.
-Notice that the video is a short shot, so the people depicted in diferrent images are usually always the same people.
 Audio events are actually the description from the audio of the video.
 Your job is to use these information to provide a short resume about what is happening in the video.
 """
 def extract_frames(video_in, interval=24, output_format='.jpg'):

 standard_sys = f"""
 You will be provided a list of visual events, and an audio description. All these informations come from a single video.
 List of visual events are actually images extracted from this video every 12 frames.
+Notice that the video is usually a short sequence, so the people depicted in diferrent images are usually always the same people.
 Audio events are actually the description from the audio of the video.
 Your job is to use these information to provide a short resume about what is happening in the video.
+Do not mention still image. Only focus on the action.
 """
 def extract_frames(video_in, interval=24, output_format='.jpg'):