|
--- |
|
license: apache-2.0 |
|
pipeline_tag: image-text-to-text |
|
library_name: transformers |
|
--- |
|
|
|
# Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation |
|
|
|
<p align="left"> |
|
<a href='https://jiwanchung.github.io/' target='_blank'>Jiwan Chung<sup>*</sup></a>  |
|
<a href='https://junhyeok.kim/' target='_blank'>Junhyeok Kim<sup>*</sup></a>  |
|
<a href='https://scholar.google.com/citations?user=w3hOuRoAAAAJ' target='_blank'>Siyeol Kim</a>  |
|
<a href='https://jaeyoung-l.github.io/' target='_blank'>Jaeyoung Lee</a>  |
|
<a href="https://scholar.google.com/citations?user=Og3gN_AAAAAJ" target='_blank'>Minsoo Kim</a>  |
|
<a href='https://mirlab.yonsei.ac.kr/' target='_blank'>Youngjae Yu</a> |
|
</p> |
|
|
|
[](https://arxiv.org/abs/2505.18842) [](https://huggingface.co/kjunh/v1-7B) |
|
|
|
<p align="center"> |
|
<img src="assets/figure.png"> |
|
</p> |
|
|
|
## Installation |
|
```bash |
|
conda create -n v1 python=3.10 -y |
|
conda activate v1 |
|
pip install -r requirements.txt |
|
pip install flash-attn --no-build-isolation |
|
``` |
|
|
|
## Demo |
|
|
|
### Gradio Web UI |
|
Highly Recommended as the copy tokens are displayed on image. |
|
|
|
<p align="center"> |
|
<img src="assets/demo.png"> |
|
</p> |
|
|
|
```bash |
|
python run_gradio.py |
|
``` |
|
|
|
### Inference |
|
```bash |
|
python inference.py |
|
``` |
|
The script uses a default image URL and text prompt. To use your own inputs, you can modify the `image` variable within the `messages` list and the `text` field for the user prompt. |
|
|
|
## Coming Soon |
|
- [x] Inference code |
|
- [ ] Training data |
|
- [ ] Evaluation code |
|
- [ ] Training code |
|
|
|
|
|
## Citation |
|
If you find our work valuable, please cite: |
|
```bibtex |
|
@misc{chung2025dontlookoncemultimodal, |
|
title={Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation}, |
|
author={Jiwan Chung and Junhyeok Kim and Siyeol Kim and Jaeyoung Lee and Min Soo Kim and Youngjae Yu}, |
|
year={2025}, |
|
eprint={2505.18842}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL}, |
|
url={https://arxiv.org/abs/2505.18842}, |
|
} |
|
``` |