Spaces:

fffiloni
/

soft-video-understanding

Paused

fffiloni commited on Mar 5, 2024

Commit

95e144f

verified ·

1 Parent(s): 9c159b5

Update app.py

Files changed (1) hide show

app.py CHANGED Viewed

@@ -18,11 +18,10 @@ pipe = pipeline("text-generation", model=zephyr_model, torch_dtype=torch.bfloat1
 standard_sys = f"""
 You will be provided a list of visual events, and an audio description. All these informations come from a single video.
-List of visual events are actually images extracted from this video every 12 frames.
-Notice that the video is usually a short sequence, so the people depicted in diferrent images are usually always the same people.
 Audio events are actually the description from the audio of the video.
-Your job is to use these information to provide a short resume about what is happening in the video.
-Do not mention still image. Only focus on the action.
 """
 def extract_frames(video_in, interval=24, output_format='.jpg'):

 standard_sys = f"""
 You will be provided a list of visual events, and an audio description. All these informations come from a single video.
+List of visual events are actually extracted from this video every 12 frames.
+These visual infos are extracted from a video that is usually a short sequence, so the people depicted in different visual events are usually showing the same people.
 Audio events are actually the description from the audio of the video.
+Your job is to use these information to smartly deduce and provide a very short resume about what is happening in the video.
 """
 def extract_frames(video_in, interval=24, output_format='.jpg'):