Spaces:

Demo750
/

XGBoost_Gaze

Running

App Files Files Community

XGBoost_Gaze / MiniCPM-V /docs /best_practice_summary.md

Demo750

Upload folder using huggingface_hub

569f484 verified 3 months ago

preview code

raw

history blame

2.06 kB

	# MiniCPM-V Best Practices

	MiniCPM-V is a series of end-side multimodal LLMs (MLLMs) designed for vision-language understanding. The models take image, video and text as inputs and provide high-quality text output, aiming to achieve strong performance and efficient deployment. The most notable models in this series currently include MiniCPM-Llama3-V 2.5 and MiniCPM-V 2.6. The following sections provide detailed tutorials and guidance for each version of the MiniCPM-V models.


	## MiniCPM-V 2.6

	MiniCPM-V 2.6 is the latest and most capable model in the MiniCPM-V series. With a total of 8B parameters, the model surpasses GPT-4V in single image, multi-image and video understanding. It outperforms GPT-4o mini, Gemini 1.5 Pro and Claude 3.5 Sonnet in single image understanding, and advances MiniCPM-Llama3-V 2.5's features such as strong OCR capability, trustworthy behavior, multilingual support, and end-side deployment. Due to its superior token density, MiniCPM-V 2.6 can for the first time support real-time video understanding on end-side devices such as iPad.

	* [Deployment Tutorial](https://modelbest.feishu.cn/wiki/C2BWw4ZP0iCDy7kkCPCcX2BHnOf)
	* [Training Tutorial](https://modelbest.feishu.cn/wiki/GeHMwLMa0i2FhUkV0f6cz3HWnV1)
	* [Quantization Tutorial](https://modelbest.feishu.cn/wiki/YvsPwnPwWiqUjlkmW0scQ76TnBb)

	## MiniCPM-Llama3-V 2.5

	MiniCPM-Llama3-V 2.5 is built on SigLip-400M and Llama3-8B-Instruct with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-V 2.0.

	* [Quantization Tutorial](https://modelbest.feishu.cn/wiki/Kc7ywV4X1ipSaAkuPFOc9SFun8b)
	* [Training Tutorial](https://modelbest.feishu.cn/wiki/UpSiw63o9iGDhIklmwScX4a6nhW)
	* [End-side Deployment](https://modelbest.feishu.cn/wiki/Lwr9wpOQdinr6AkLzHrc9LlgnJD)
	* [Deployment Tutorial](https://modelbest.feishu.cn/wiki/LTOKw3Hz7il9kGkCLX9czsennKe)
	* [HD Decoding Tutorial](https://modelbest.feishu.cn/wiki/Ug8iwdXfhiHVsDk2gGEco6xnnVg)
	* [Model Structure](https://modelbest.feishu.cn/wiki/ACtAw9bOgiBQ9lkWyafcvtVEnQf)