Spaces:
Runtime error
Runtime error
Update constraint.py
Browse files- constraint.py +86 -1
constraint.py
CHANGED
@@ -1,6 +1,91 @@
|
|
1 |
SYS_PROMPT = ""
|
2 |
|
3 |
-
USER_PROMPT = """
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
"""
|
5 |
|
6 |
SKIP = 2
|
|
|
1 |
SYS_PROMPT = ""
|
2 |
|
3 |
+
USER_PROMPT = """# CONTEXT #
|
4 |
+
|
5 |
+
You are a powerful video captioner.I want to tag 200,000 video files for use in training a text-to-video dataset. The purpose of the video tags is to train a text-to-video model. You need to provide a structured, detailed, and accurate description of the given video.
|
6 |
+
|
7 |
+
# OBJECTIVE #
|
8 |
+
|
9 |
+
Video Description Task Instructions
|
10 |
+
Video Content Description:
|
11 |
+
|
12 |
+
Detail and Accuracy: Provide a detailed and accurate description of the video content. Include all key objects, their types, colors, actions, positions, and relative positions. Describe the overall atmosphere.
|
13 |
+
Persons and Animals: If there are people, describe their appearance and actions. If there are animals, describe their behavior to give a clear understanding of the scene.
|
14 |
+
Multiple Scenes: If the video has multiple scenes, describe how they transition and highlight the differences between them.
|
15 |
+
Objectivity: Do not include imagined content or overly subjective feelings. Ensure all descriptions are based on what can be confidently determined from the video.
|
16 |
+
Grammar and Length: Use correct English grammar. Each descriptive sentence should be at least three sentences long.
|
17 |
+
Video Quality Evaluation:
|
18 |
+
|
19 |
+
Aesthetic Value: Evaluate the aesthetic value, including composition, color harmony, and overall visual effect. Score this aspect from 1 to 5 and explain your reasoning.
|
20 |
+
Clarity: Assess the clarity, including resolution and detail presentation. Score this aspect from 1 to 5 and explain your reasoning.
|
21 |
+
Emotional Impact: Evaluate the emotional impact, including how well the video conveys emotions and resonates with the audience. Score this aspect from 1 to 5 and explain your reasoning.
|
22 |
+
Summary: Provide a summary of the scores for aesthetic value, clarity, and emotional impact.
|
23 |
+
Film Perspective Analysis:
|
24 |
+
|
25 |
+
Shot Analysis: Analyze the type of shots used (close-up, medium, long shot, etc.).
|
26 |
+
Camera Movements: Describe the camera movements (push, pull, pan, tilt, track, crane, etc.).
|
27 |
+
Composition: Analyze the composition of the shots.
|
28 |
+
Interpretation: Provide your interpretation and feelings about the photographic work.
|
29 |
+
|
30 |
+
# STYLE #
|
31 |
+
cinematic language,such as narrative techniques, visual aesthetics, editing styles, and sound design.
|
32 |
+
|
33 |
+
# Output Structure #
|
34 |
+
Video Content:
|
35 |
+
|
36 |
+
{Detailed description of the video here, meeting the above requirements}.
|
37 |
+
|
38 |
+
Video Quality:
|
39 |
+
|
40 |
+
{Evaluation score and explanation of the video quality here}.
|
41 |
+
Film Perspective Description:
|
42 |
+
|
43 |
+
|
44 |
+
{Analysis of the video from a film perspective here}.
|
45 |
+
Example:
|
46 |
+
|
47 |
+
Video Content:
|
48 |
+
|
49 |
+
A stylish woman strides down a Tokyo street illuminated by warm neon lights and animated city signage. She sports a black leather jacket, a long red dress, black boots, and carries a black purse. Her look is completed with sunglasses and red lipstick. Her demeanor is confident and casual. The damp street reflects the vibrant lights, creating a mirror effect. The scene is bustling with numerous pedestrians.
|
50 |
+
Video Quality:
|
51 |
+
|
52 |
+
Aesthetic Value:
|
53 |
+
- Composition and Color: The video showcases a well-balanced composition with harmonious color schemes, achieving a visually pleasing effect. Techniques such as symmetry and dynamic composition are skillfully employed.
|
54 |
+
- Camera Work: The visual experience is enhanced by smooth transitions and diverse angles.
|
55 |
+
- Score: 4/5
|
56 |
+
|
57 |
+
Clarity:
|
58 |
+
- Resolution: The video boasts high resolution with clear details.
|
59 |
+
- Detail Presentation: It presents rich details with no noticeable blurriness or distortion.
|
60 |
+
- Score: 5/5
|
61 |
+
|
62 |
+
Emotional Impact:
|
63 |
+
- Emotion Conveyance: The video successfully conveys joy and excitement, striking a chord with the audience.
|
64 |
+
- Resonance: The compelling emotional expression, supported by well-integrated music and visuals, creates a strong impact.
|
65 |
+
- Score: 4/5
|
66 |
+
|
67 |
+
Summary:
|
68 |
+
- Aesthetic Value: 4/5
|
69 |
+
- Video Clarity: 5/5
|
70 |
+
- Emotional Impact: 4/5
|
71 |
+
Film Perspective Description:
|
72 |
+
|
73 |
+
Characters:
|
74 |
+
- Woman: A stylish woman dressed in a black leather jacket, long red dress, black boots, and carrying a black purse. She wears sunglasses and red lipstick.
|
75 |
+
|
76 |
+
Scenes:
|
77 |
+
- Tokyo Street: The street is filled with warm glowing neon lights and animated city signage, with damp reflective surfaces and numerous pedestrians.
|
78 |
+
|
79 |
+
Shot 1:
|
80 |
+
- The woman walks confidently and casually down the Tokyo street.
|
81 |
+
- She heads towards the camera in a panoramic view with central composition. The camera is at eye level and follows her with a handheld shot.
|
82 |
+
- Duration: 36 seconds
|
83 |
+
|
84 |
+
Shot 2:
|
85 |
+
- The woman continues her walk down the Tokyo street, maintaining her confident and casual demeanor.
|
86 |
+
- She approaches the camera, with a close-up of her face, transitioning to a torso mid-shot. The camera remains at eye level, following her with a handheld shot.
|
87 |
+
- Duration: 24 seconds
|
88 |
+
|
89 |
"""
|
90 |
|
91 |
SKIP = 2
|