wingrune commited on
Commit
4b9223c
·
verified ·
1 Parent(s): 35ac252

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -3
README.md CHANGED
@@ -1,3 +1,30 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: visual-question-answering
4
+ ---
5
+ # 3DGraphLLM
6
+
7
+ 3DGraphLLM is a model that uses a 3D scene graph and an LLM to perform 3D vision-language tasks.
8
+
9
+ <p align="center">
10
+ <img src="ga.png" width="80%">
11
+ </p>
12
+
13
+
14
+ ## Model Details
15
+
16
+ We provide our best checkpoint that uses [Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) as an LLM, [Mask3D](https://github.com/JonasSchult/Mask3D) 3D instance segmentation to get scene graph nodes, [VL-SAT](https://github.com/wz7in/CVPR2023-VLSAT) to encode semantic relations [Uni3D](https://github.com/baaivision/Uni3D) as 3D object encoder, and [DINOv2](https://github.com/facebookresearch/dinov2) as 2D object encoder.
17
+
18
+ ## Citation
19
+ If you find 3DGraphLLM helpful, please consider citing our work as:
20
+ ```
21
+ @misc{zemskova20243dgraphllm,
22
+ title={3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding},
23
+ author={Tatiana Zemskova and Dmitry Yudin},
24
+ year={2024},
25
+ eprint={2412.18450},
26
+ archivePrefix={arXiv},
27
+ primaryClass={cs.CV},
28
+ url={https://arxiv.org/abs/2412.18450},
29
+ }
30
+ ```