|
--- |
|
license: mit |
|
--- |
|
![header](./assets/header.png) |
|
|
|
<p align="center"> |
|
๐ <a href="" target="_blank">Paper</a> โข ๐ <a href="" target="_blank">Demo</a> โข ๐ <a href="https://github.com/FreedomIntelligence/LongLLaVA" target="_blank">LongLLaVA</a> |
|
</p> |
|
|
|
![efficiency](./assets/singleGPU.png) |
|
|
|
## ๐ Update |
|
|
|
* **[2024.09.05]** LongLLaVA repo is published๏ผ๐ |
|
|
|
## Architecture |
|
|
|
<details> |
|
<summary>Click to view the architecture image</summary> |
|
|
|
![Architecture Image](./assets/arch.png) |
|
|
|
</details> |
|
|
|
|
|
## Results |
|
|
|
<details> |
|
<summary>Click to view the Results</summary> |
|
|
|
- Main Results |
|
![Main Results](./assets/result1.png) |
|
- Diagnostic Results |
|
![Diagnostic Results](./assets/diaresult.png) |
|
- Video-NIAH |
|
![Video-NIAH](./assets/NIAH.png) |
|
|
|
</details> |
|
|
|
|
|
|
|
## Results reproduction |
|
|
|
### Data DownLoad and Construction |
|
|
|
<details> |
|
<summary>Dataset Taxonomy</summary> |
|
|
|
![Dataset](./assets/dataset.png) |
|
|
|
</details> |
|
|
|
<details> |
|
<summary>Dataset DownLoading and Construction</summary> |
|
|
|
> Coming Soon~ |
|
|
|
</details> |
|
|
|
|
|
|
|
### Evaluation |
|
|
|
> Model checkpoint is Coming Soon~ |
|
|
|
|
|
|
|
## Citation |
|
|
|
``` |
|
@misc{wang2024longllavascalingmultimodalllms, |
|
title={LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture}, |
|
author={Xidong Wang and Dingjie Song and Shunian Chen and Chen Zhang and Benyou Wang}, |
|
year={2024}, |
|
eprint={2409.02889}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL}, |
|
url={https://arxiv.org/abs/2409.02889}, |
|
} |
|
``` |