Spaces:
Running
on
Zero
Running
on
Zero
File size: 2,616 Bytes
6073e55 23fdbc0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
# Copyright (c) 2025 Ye Liu. Licensed under the BSD-3-Clause License.
IGNORE_INDEX = -100
REG_TOKEN = '<|reg|>'
SEG_S_TOKEN = '<|seg_start|>'
SEG_E_TOKEN = '<|seg_end|>'
PLANNER_PROMPT = (
'You are acting as the planner now. '
'Given a question about the video, your task is to analyze the question and identify the best way to answer this question. '
'You have access to the following tools:\n\n'
'Grounder: Accepts a text query and localize the relevant video segment according to the query.\n'
'Verifier: A tool supporting grounder by verifying the reliability of its outputs.\n'
'Answerer: Answer a given question directly based on the whole video or a cropped video segment.\n\n'
'Your response must be a list in JSON format. '
'A valid plan for reasoning could be "grounder, verifier, answer", "grounder, verifier", or "answerer", depending on the given question. '
'Please see an example for the format below.\n\n'
'[{{"type": "grounder", "value": "<text query>"}}, {{"type": "verifier"}}, {{"type": "answerer"}}]\n\n'
'Note that only the grounder can accept an argument called "value", which is the text query used for grounding. '
"Now I give you the question: '{}'. "
'Please think carefully and respond with your plan in JSON directly.')
GROUNDER_PROMPT = (
'You are acting as the grounder now. '
'Given a video and a text query, your goal is to temporally localize the video moment described by the query. '
'If the query is directly describing a moment, simply localize it according to its content. '
"Otherwise, if the moment is described as 'before/after a pivotal event', you need to determine the actual event it refers to. "
'The localized moment should only cover the target event. '
"Now I give you the query: '{}'. "
'Please think carefully and provide your response.')
VERIFIER_PROMPT = (
'You are acting as the verifier now. '
'You will be presented a text query describing a moment that potentialy happens in the given video. '
f'Your task is to identify whether the video segment between {SEG_S_TOKEN} and {SEG_E_TOKEN} perfectly covers the moment. '
f'If the described moment can be seen in the video, please focus on verifying whether the moment starts at {SEG_S_TOKEN} and ends at {SEG_E_TOKEN}. '
"Respond with 'Yes' if you think the moment boundaries are correct, otherwise 'No'. "
"If the described moment cannot be seen in the video, respond with 'No' directly. "
"Now I give you the query: '{}'. "
"Please think carefully and respond with 'Yes' or 'No' directly.")
|