|
# Datasets and Evaluation |
|
|
|
## SSv2-ST (SSv2 Spatio-Temporal dataset) |
|
|
|
### Pre-processing |
|
Our pre-processing pipeline is described here. We first extract the first noun chunk of the caption using Spacy. Then this subject is fed into Owl-ViT-L to obtain bounding boxes. If there are 0 bounding boxes corresponding to a subject, we use the next caption from the dataset. If there are atleast two bounding boxes, we interpolate bounding boxes for the missing frames linearly. The dataset downloading is a bit complex, you need to follow the instructions [here](https://github.com/MikeWangWZHL/Paxion#dataset-setup). Download the dataset and run `generate_ssv2_st.py`. |
|
|
|
## Interactive Motion Control - IMC |
|
We generate bounding boxes for this dataset using the `generate_imc.py` file. The prompts are in `custom_prompts.csv` and `filtered_prompts.csv`. |
|
|
|
For more details regarding the datasets and evaluation strategy, please refer to the [Peekaboo paper](https://arxiv.org/abs/2312.07509). |
|
|