Yongxin-Guo commited on
Commit
ee74ab3
1 Parent(s): 9f1b404

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +59 -0
README.md ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - mistralai/Mistral-7B-Instruct-v0.2
7
+ tags:
8
+ - video temporal grounding
9
+ - dense video caption
10
+ - video highlight detection
11
+ ---
12
+
13
+ <h2 align="center"> <a href="https://arxiv.org/abs/2410.05643">TRACE: Temporal Grounding Video LLM via Causal Event Modeling</a></h2>
14
+ <h5 align="center"> If our project helps you, please give us a star ⭐ on <a href="https://github.com/gyxxyg/TRACE">GitHub</a> and cite our paper!</h2>
15
+ <h5 align="center">
16
+
17
+ ## 📰 News
18
+
19
+ - **[2024.10.10]** 🔥 Our [code](https://github.com/gyxxyg/TRACE) and [paper](https://arxiv.org/abs/2410.05643) are released!
20
+ - **[2024.10.10]** 🔥 Our **checkpoints** are available now!
21
+
22
+ ## Overview
23
+
24
+ In this work
25
+ - We model the videos by a series of events, and propose causal event modeling framework to capture videos' inherent structure.
26
+ - We present a novel task-interleaved video LLM model, TRACE, tailored to implement the causal event modeling framework through the sequential encoding/decoding of timestamps, salient scores, and textual captions.
27
+
28
+ ## Model Zoo
29
+
30
+ | Checkpoints | Description | URL |
31
+ | ----------- | ----------- | ----------- |
32
+ | Initialization | Weights initialized from VideoLLaMA2 | [trace-init](https://huggingface.co/Yongxin-Guo/trace-init) |
33
+ | Stage-1 | Model checkpoints trained after stage-1 | [trace-stage1](https://huggingface.co/Yongxin-Guo/trace-stage1) |
34
+ | Stage-2 | Model checkpoints trained after stage-2 | [trace](https://huggingface.co/Yongxin-Guo/trace) |
35
+ | FT-Charades | Fine-tuned on Charades-STA dataset | [trace-ft-charades](https://huggingface.co/Yongxin-Guo/trace-ft-charades) |
36
+ | FT-Youcook2 | Fine-tuned on Youcook2 dataset | [trace-ft-youcook2](https://huggingface.co/Yongxin-Guo/trace-ft-youcook2) |
37
+ | FT-QVHighlights | Fine-tuned on QVHighlights dataset | [trace-ft-qvhighlights](https://huggingface.co/Yongxin-Guo/trace-ft-qvhighlights) |
38
+
39
+ #### Results
40
+
41
+ | Youcook2 (Zero-Shot) | CIDER | METEOR | SODA_c | F1 |
42
+ | --- | --- | --- | --- | --- |
43
+ | TRACE | 8.1 | 2.8 | 2.2 | 22.4 |
44
+
45
+ | Charades-STA (Zero-Shot) | 0.3 | 0.5 | 0.7 | mIOU |
46
+ | --- | --- | --- | --- | --- |
47
+ | TRACE | 58.6 | 40.3 | 19.4 | 38.7 |
48
+
49
+ | QVHighlights (Zero-Shot) | mAP | Hit@1 |
50
+ | --- | --- | --- |
51
+ | TRACE | 26.8 | 42.7
52
+
53
+ | ActivityNet-DVC | CIDER | METEOR | SODA_c | F1 |
54
+ | --- | --- | --- | --- | --- |
55
+ | TRACE | 25.9 | 6.0 | 6.4 | 39.3 |
56
+
57
+ | ActivityNet-MR | 0.3 | 0.5 | 0.7 | mIOU |
58
+ | --- | --- | --- | --- | --- |
59
+ | TRACE | 53.0 | 37.7 | 24.0 | 39.0 |