--- license: apache-2.0 --- ![Image](assets/logo.jpeg)
# HaploVL - A Single-Transformer Baseline for Multi-Modal Understanding [![Project page](https://img.shields.io/badge/Project_page-green)](https://haplo-vl.github.io/) 
HaploVL is a multimodal understanding foundation model that delivers comprehensive cross-modal understanding capabilities for text, images, and video inputs through a single transformer architecture. ## Highlights This repository contains the PyTorch implementation, model weights, and training code for **Haplo**. ![Image](assets/framework.png) 🌟 **Unified Architecture**: Single transformer model supporting early fusion of multi-modal inputs and auto-regressive response generation 🌟 **Efficient Training**: Optimized training recipe leveraging pre-trained knowledge with reduced resource consumption 🌟 **Scalable Design**: Flexible framework supporting both Ascend NPU and GPU environments 🌟 **Extended Capabilities**: Native support for multiple image understanding and video processing ## Getting Started ### Installation ```bash # Option1: pip install git+https://github.com/Tencent/HaploVLM.git # Option2: git clone https://github.com/Tencent/HaploVLM.git cd HaploVLM pip install -e . -v ``` ### Quick Start Basic usage example: ```python from haplo import HaploProcessor, HaploForConditionalGeneration processor = HaploProcessor.from_pretrained('stevengrove/Haplo-7B-Pro-Video') model = HaploForConditionalGeneration.from_pretrained( 'stevengrove/Haplo-7B-Pro-Video', torch_dtype=torch.bfloat16 ).to('cuda') conversation = [ {'role': 'user', 'content': [ {'type': 'text', 'text': 'Describe this image.'}, {'type': 'image', 'path': 'assets/example-image.png'} ]} ] inputs = processor.apply_chat_template( conversation, add_generation_prompt=True, return_tensors='pt' ).to('cuda') outputs = model.generate(inputs) print(processor.decode(outputs[0])) ``` ## Acknowledgement ```bibtex @article{yang2024haplo, title={HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding}, author={Yang, Rui and Song, Lin and Xiao, Yicheng and Huang, Runhui and Ge, Yixiao and Shan, Ying and Zhao, Hengshuang}, journal={arXiv preprint arXiv:xxxx.xxxxx}, year={2025} } ```