Correct the spelling error in the ReadMe file: change "guild" to "guide"
#11
by
QscQ
- opened
README.md
CHANGED
@@ -60,7 +60,7 @@ pipeline_tag: image-text-to-text
|
|
60 |
# MiniMax-VL-01
|
61 |
|
62 |
## 1. Introduction
|
63 |
-
We are delighted to introduce our **MiniMax-VL-01** model. It adopts the
|
64 |
MiniMax-VL-01 has a notable dynamic resolution feature. Input images are resized per a pre-set grid, with resolutions from 336×336 to 2016×2016, keeping a 336×336 thumbnail. The resized images are split into non-overlapping patches of the same size. These patches and the thumbnail are encoded separately and then combined for a full image representation.
|
65 |
The training data for MiniMax-VL-01 consists of caption, description, and instruction data. The Vision Transformer (ViT) is trained on 694 million image-caption pairs from scratch. Across four distinct stages of the training pipeline, a total of 512 billion tokens are processed, leveraging this vast amount of data to endow the model with strong capabilities.
|
66 |
Finally, MiniMax-VL-01 has reached top-level performance on multimodal leaderboards, demonstrating its edge and dependability in complex multimodal tasks.
|
@@ -190,9 +190,18 @@ For production deployment, we recommend using [vLLM](https://docs.vllm.ai/en/lat
|
|
190 |
⚡ Efficient and intelligent memory management
|
191 |
📦 Powerful batch request processing capability
|
192 |
⚙️ Deeply optimized underlying performance
|
193 |
-
For detailed deployment instructions, please refer to our [vLLM Deployment Guide](https://github.com/MiniMax-AI/MiniMax-01/blob/main/docs/
|
194 |
|
195 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
196 |
|
197 |
```
|
198 |
@misc{minimax2025minimax01scalingfoundationmodels,
|
@@ -206,9 +215,9 @@ For detailed deployment instructions, please refer to our [vLLM Deployment Guide
|
|
206 |
}
|
207 |
```
|
208 |
|
209 |
-
##
|
210 |
For general use and evaluation, we provide a [Chatbot](https://www.hailuo.ai/) with online search capabilities and the [online API](https://www.minimax.io/platform) for developers. For general use and evaluation, we provide the [MiniMax MCP Server](https://github.com/MiniMax-AI/MiniMax-MCP) with video generation, image generation, speech synthesis, and voice cloning for developers.
|
211 |
|
212 |
|
213 |
-
##
|
214 |
Contact us at [[email protected]](mailto:[email protected]).
|
|
|
60 |
# MiniMax-VL-01
|
61 |
|
62 |
## 1. Introduction
|
63 |
+
We are delighted to introduce our **MiniMax-VL-01** model. It adopts the "ViT-MLP-LLM" framework, which is a commonly used technique in the field of multimodal large language models. The model is initialized and trained with three key parts: a 303-million-parameter Vision Transformer (ViT) for visual encoding, a randomly initialized two-layer MLP projector for image adaptation, and the MiniMax-Text-01 as the base LLM.
|
64 |
MiniMax-VL-01 has a notable dynamic resolution feature. Input images are resized per a pre-set grid, with resolutions from 336×336 to 2016×2016, keeping a 336×336 thumbnail. The resized images are split into non-overlapping patches of the same size. These patches and the thumbnail are encoded separately and then combined for a full image representation.
|
65 |
The training data for MiniMax-VL-01 consists of caption, description, and instruction data. The Vision Transformer (ViT) is trained on 694 million image-caption pairs from scratch. Across four distinct stages of the training pipeline, a total of 512 billion tokens are processed, leveraging this vast amount of data to endow the model with strong capabilities.
|
66 |
Finally, MiniMax-VL-01 has reached top-level performance on multimodal leaderboards, demonstrating its edge and dependability in complex multimodal tasks.
|
|
|
190 |
⚡ Efficient and intelligent memory management
|
191 |
📦 Powerful batch request processing capability
|
192 |
⚙️ Deeply optimized underlying performance
|
193 |
+
For detailed deployment instructions, please refer to our [vLLM Deployment Guide](https://github.com/MiniMax-AI/MiniMax-01/blob/main/docs/vllm_deployment_guide.md).
|
194 |
|
195 |
+
## 5. Function Calling
|
196 |
+
MiniMax-VL-01 supports Function Calling capability, enabling the model to intelligently identify when external functions need to be called and output parameters in structured JSON format. With Function Calling, you can:
|
197 |
+
|
198 |
+
- Let the model recognize implicit function call needs in user requests
|
199 |
+
- Receive structured parameter outputs for seamless application integration
|
200 |
+
- Support various complex parameter types, including nested objects and arrays
|
201 |
+
|
202 |
+
Function Calling supports standard OpenAI-compatible format definitions and integrates seamlessly with the Transformers library. For detailed usage instructions, please refer to our [Function Call Guide](./MiniMax-VL-01_Function_Call_Guide.md) or [Chinese Guide](./MiniMax-VL-01_Function_Call_Guide_CN.md).
|
203 |
+
|
204 |
+
## 6. Citation
|
205 |
|
206 |
```
|
207 |
@misc{minimax2025minimax01scalingfoundationmodels,
|
|
|
215 |
}
|
216 |
```
|
217 |
|
218 |
+
## 7. Chatbot & API
|
219 |
For general use and evaluation, we provide a [Chatbot](https://www.hailuo.ai/) with online search capabilities and the [online API](https://www.minimax.io/platform) for developers. For general use and evaluation, we provide the [MiniMax MCP Server](https://github.com/MiniMax-AI/MiniMax-MCP) with video generation, image generation, speech synthesis, and voice cloning for developers.
|
220 |
|
221 |
|
222 |
+
## 8. Contact Us
|
223 |
Contact us at [[email protected]](mailto:[email protected]).
|