--- license: mit pipeline_tag: image-text-to-text --- This model is in the paper [Docopilot: Improving Multimodal Models for Document-Level Understanding](https://openaccess.thecvf.com/content/CVPR2025/html/Duan_Docopilot_Improving_Multimodal_Models_for_Document-Level_Understanding_CVPR_2025_paper.html). Please refer to https://github.com/OpenGVLab/Docopilot for details.