openbmb
/

MiniCPM4-0.5B-QAT-Int4-unquantized

@@ -1,11 +1,12 @@
 ---
-license: apache-2.0
 language:
 - zh
 - en
-pipeline_tag: text-generation
 library_name: transformers
 ---
 <div align="center">
 <img src="https://github.com/OpenBMB/MiniCPM/blob/main/assets/minicpm_logo.png?raw=true" width="500em" ></img>
 </div>
@@ -18,6 +19,12 @@ library_name: transformers
 👋 Join us on <a href="https://discord.gg/3cGQn9b3YM" target="_blank">Discord</a> and <a href="https://github.com/OpenBMB/MiniCPM/blob/main/assets/wechat.jpg" target="_blank">WeChat</a>
 </p>
 ## What's New
 - [2025.06.06] **MiniCPM4** series are released! This model achieves ultimate efficiency improvements while maintaining optimal performance at the same scale! It can achieve over 5x generation acceleration on typical end-side chips! You can find technical report [here](https://github.com/OpenBMB/MiniCPM/tree/main/report/MiniCPM_4_Technical_Report.pdf).🔥🔥🔥
@@ -50,7 +57,7 @@ MiniCPM 4 is an extremely efficient edge-side large model that has undergone eff
 - 📚 **High-Quality Training Data:**
   - UltraClean -- High-quality Pre-training Data Filtering and Generation: Builds iterative data cleaning strategies based on efficient data verification, open-sourcing high-quality Chinese and English pre-training dataset [UltraFinweb](https://huggingface.co/datasets/openbmb/Ultra-FineWeb)
-  - UltraChat v2 -- High-quality Supervised Fine-tuning Data Generation: Constructs large-scale high-quality supervised fine-tuning datasets covering multiple dimensions including knowledge-intensive data, reasoning-intensive data, instruction-following data, long text understanding data, and tool calling data
 - ⚡ **Efficient Inference System:**
   - CPM.cu -- Lightweight and Efficient CUDA Inference Framework: Integrates sparse attention, model quantization, and speculative sampling to achieve efficient prefilling and decoding
@@ -129,8 +136,6 @@ print(outputs[0].outputs[0].text)
 | MBPP           | 47.86 | 47.47    | 59.92  | 59.14    | 55.64    | 55.25    |
 | AVERAGE        | 44.73 | 36.66    | 42.15  | 58.00    | 55.68    | 55.45    |
 ## Statement
 - As a language model, MiniCPM generates content by learning from a vast amount of text.
 - However, it does not possess the ability to comprehend or express personal opinions or value judgments.
@@ -149,4 +154,4 @@ print(outputs[0].outputs[0].text)
   author={MiniCPM Team},
   year={2025}
 }
-```

 ---
 language:
 - zh
 - en
 library_name: transformers
+license: apache-2.0
+pipeline_tag: text-generation
 ---
 <div align="center">
 <img src="https://github.com/OpenBMB/MiniCPM/blob/main/assets/minicpm_logo.png?raw=true" width="500em" ></img>
 </div>
 👋 Join us on <a href="https://discord.gg/3cGQn9b3YM" target="_blank">Discord</a> and <a href="https://github.com/OpenBMB/MiniCPM/blob/main/assets/wechat.jpg" target="_blank">WeChat</a>
 </p>
+This model is the Int4 version of MiniCPM4-0.5B, which is trained by QAT and stored in fake quantization style. It is described in the paper [MiniCPM4: Ultra-Efficient LLMs on End Devices](https://huggingface.co/papers/2506.07900).
+**Abstract:**
+This paper introduces MiniCPM4, a highly efficient large language model (LLM) designed explicitly for end-side devices. We achieve this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems. Specifically, in terms of model architecture, we propose InfLLM v2, a trainable sparse attention mechanism that accelerates both prefilling and decoding phases for long-context processing. Regarding training data, we propose UltraClean, an efficient and accurate pre-training data filtering and generation strategy, and UltraChat v2, a comprehensive supervised fine-tuning dataset. These datasets enable satisfactory model performance to be achieved using just 8 trillion training tokens. Regarding training algorithms, we propose ModelTunnel v2 for efficient pre-training strategy search, and improve existing post-training methods by introducing chunk-wise rollout for load-balanced reinforcement learning and data-efficient tenary LLM, BitCPM. Regarding inference systems, we propose this http URL that integrates sparse attention, model quantization, and speculative sampling to achieve efficient prefilling and decoding. To meet diverse on-device requirements, MiniCPM4 is available in two versions, with 0.5B and 8B parameters, respectively. Sufficient evaluation results show that MiniCPM4 outperforms open-source models of similar size across multiple benchmarks, highlighting both its efficiency and effectiveness. Notably, MiniCPM4-8B demonstrates significant speed improvements over Qwen3-8B when processing long sequences. Through further adaptation, MiniCPM4 successfully powers diverse applications, including trustworthy survey generation and tool use with model context protocol, clearly showcasing its broad usability.
 ## What's New
 - [2025.06.06] **MiniCPM4** series are released! This model achieves ultimate efficiency improvements while maintaining optimal performance at the same scale! It can achieve over 5x generation acceleration on typical end-side chips! You can find technical report [here](https://github.com/OpenBMB/MiniCPM/tree/main/report/MiniCPM_4_Technical_Report.pdf).🔥🔥🔥
 - 📚 **High-Quality Training Data:**
   - UltraClean -- High-quality Pre-training Data Filtering and Generation: Builds iterative data cleaning strategies based on efficient data verification, open-sourcing high-quality Chinese and English pre-training dataset [UltraFinweb](https://huggingface.co/datasets/openbmb/Ultra-FineWeb)
+  - UltraChat v2 -- High-quality Supervised Fine-tuning Data Generation: Constructs large-scale high-quality supervised fine-tuning datasets covering multiple dimensions including knowledge-intensive data, reasoning-intensive data, instruction-following data, and tool calling data
 - ⚡ **Efficient Inference System:**
   - CPM.cu -- Lightweight and Efficient CUDA Inference Framework: Integrates sparse attention, model quantization, and speculative sampling to achieve efficient prefilling and decoding
 | MBPP           | 47.86 | 47.47    | 59.92  | 59.14    | 55.64    | 55.25    |
 | AVERAGE        | 44.73 | 36.66    | 42.15  | 58.00    | 55.68    | 55.45    |
 ## Statement
 - As a language model, MiniCPM generates content by learning from a vast amount of text.
 - However, it does not possess the ability to comprehend or express personal opinions or value judgments.
   author={MiniCPM Team},
   year={2025}
 }
+```