---
license: mit
pipeline_tag: image-text-to-text
---

This model is in the paper [Docopilot: Improving Multimodal Models for Document-Level Understanding](https://openaccess.thecvf.com/content/CVPR2025/html/Duan_Docopilot_Improving_Multimodal_Models_for_Document-Level_Understanding_CVPR_2025_paper.html).

Please refer to https://github.com/OpenGVLab/Docopilot for details.