qingjun
commited on
Commit
·
a042771
1
Parent(s):
b81976d
Update README.md: fix the heading for the Function Calling section, add a detailed introduction to function calling, and correct the link to the vLLM deployment guide. Adjust the section numbering to reflect the new content.
Browse files
README.md
CHANGED
@@ -60,7 +60,7 @@ pipeline_tag: image-text-to-text
|
|
60 |
# MiniMax-VL-01
|
61 |
|
62 |
## 1. Introduction
|
63 |
-
We are delighted to introduce our **MiniMax-VL-01** model. It adopts the
|
64 |
MiniMax-VL-01 has a notable dynamic resolution feature. Input images are resized per a pre-set grid, with resolutions from 336×336 to 2016×2016, keeping a 336×336 thumbnail. The resized images are split into non-overlapping patches of the same size. These patches and the thumbnail are encoded separately and then combined for a full image representation.
|
65 |
The training data for MiniMax-VL-01 consists of caption, description, and instruction data. The Vision Transformer (ViT) is trained on 694 million image-caption pairs from scratch. Across four distinct stages of the training pipeline, a total of 512 billion tokens are processed, leveraging this vast amount of data to endow the model with strong capabilities.
|
66 |
Finally, MiniMax-VL-01 has reached top-level performance on multimodal leaderboards, demonstrating its edge and dependability in complex multimodal tasks.
|
@@ -190,9 +190,18 @@ For production deployment, we recommend using [vLLM](https://docs.vllm.ai/en/lat
|
|
190 |
⚡ Efficient and intelligent memory management
|
191 |
📦 Powerful batch request processing capability
|
192 |
⚙️ Deeply optimized underlying performance
|
193 |
-
For detailed deployment instructions, please refer to our [vLLM Deployment Guide](https://github.com/MiniMax-AI/MiniMax-01/blob/main/docs/
|
194 |
|
195 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
196 |
|
197 |
```
|
198 |
@misc{minimax2025minimax01scalingfoundationmodels,
|
@@ -206,9 +215,9 @@ For detailed deployment instructions, please refer to our [vLLM Deployment Guide
|
|
206 |
}
|
207 |
```
|
208 |
|
209 |
-
##
|
210 |
For general use and evaluation, we provide a [Chatbot](https://www.hailuo.ai/) with online search capabilities and the [online API](https://www.minimax.io/platform) for developers. For general use and evaluation, we provide the [MiniMax MCP Server](https://github.com/MiniMax-AI/MiniMax-MCP) with video generation, image generation, speech synthesis, and voice cloning for developers.
|
211 |
|
212 |
|
213 |
-
##
|
214 |
Contact us at [[email protected]](mailto:[email protected]).
|
|
|
60 |
# MiniMax-VL-01
|
61 |
|
62 |
## 1. Introduction
|
63 |
+
We are delighted to introduce our **MiniMax-VL-01** model. It adopts the "ViT-MLP-LLM" framework, which is a commonly used technique in the field of multimodal large language models. The model is initialized and trained with three key parts: a 303-million-parameter Vision Transformer (ViT) for visual encoding, a randomly initialized two-layer MLP projector for image adaptation, and the MiniMax-Text-01 as the base LLM.
|
64 |
MiniMax-VL-01 has a notable dynamic resolution feature. Input images are resized per a pre-set grid, with resolutions from 336×336 to 2016×2016, keeping a 336×336 thumbnail. The resized images are split into non-overlapping patches of the same size. These patches and the thumbnail are encoded separately and then combined for a full image representation.
|
65 |
The training data for MiniMax-VL-01 consists of caption, description, and instruction data. The Vision Transformer (ViT) is trained on 694 million image-caption pairs from scratch. Across four distinct stages of the training pipeline, a total of 512 billion tokens are processed, leveraging this vast amount of data to endow the model with strong capabilities.
|
66 |
Finally, MiniMax-VL-01 has reached top-level performance on multimodal leaderboards, demonstrating its edge and dependability in complex multimodal tasks.
|
|
|
190 |
⚡ Efficient and intelligent memory management
|
191 |
📦 Powerful batch request processing capability
|
192 |
⚙️ Deeply optimized underlying performance
|
193 |
+
For detailed deployment instructions, please refer to our [vLLM Deployment Guide](https://github.com/MiniMax-AI/MiniMax-01/blob/main/docs/vllm_deployment_guide.md).
|
194 |
|
195 |
+
## 5. Function Calling
|
196 |
+
MiniMax-VL-01 supports Function Calling capability, enabling the model to intelligently identify when external functions need to be called and output parameters in structured JSON format. With Function Calling, you can:
|
197 |
+
|
198 |
+
- Let the model recognize implicit function call needs in user requests
|
199 |
+
- Receive structured parameter outputs for seamless application integration
|
200 |
+
- Support various complex parameter types, including nested objects and arrays
|
201 |
+
|
202 |
+
Function Calling supports standard OpenAI-compatible format definitions and integrates seamlessly with the Transformers library. For detailed usage instructions, please refer to our [Function Call Guide](./MiniMax-VL-01_Function_Call_Guide.md) or [Chinese Guide](./MiniMax-VL-01_Function_Call_Guide_CN.md).
|
203 |
+
|
204 |
+
## 6. Citation
|
205 |
|
206 |
```
|
207 |
@misc{minimax2025minimax01scalingfoundationmodels,
|
|
|
215 |
}
|
216 |
```
|
217 |
|
218 |
+
## 7. Chatbot & API
|
219 |
For general use and evaluation, we provide a [Chatbot](https://www.hailuo.ai/) with online search capabilities and the [online API](https://www.minimax.io/platform) for developers. For general use and evaluation, we provide the [MiniMax MCP Server](https://github.com/MiniMax-AI/MiniMax-MCP) with video generation, image generation, speech synthesis, and voice cloning for developers.
|
220 |
|
221 |
|
222 |
+
## 8. Contact Us
|
223 |
Contact us at [[email protected]](mailto:[email protected]).
|
test.py
ADDED
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
from openai import OpenAI
|
2 |
+
|
3 |
+
client = OpenAI(api_key="eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJHcm91cE5hbWUiOiJtbyBuaSIsIlVzZXJOYW1lIjoibW8gbmkiLCJBY2NvdW50IjoiIiwiU3ViamVjdElEIjoiMTg3NjIwMDY0ODA2NDYzNTI0MiIsIlBob25lIjoiIiwiR3JvdXBJRCI6IjE4NzYyMDA2NDgwNjA0NDA5MzgiLCJQYWdlTmFtZSI6IiIsIk1haWwiOiJuaW1vQHN1YnN1cC52aXAiLCJDcmVhdGVUaW1lIjoiMjAyNS0wMS0wNyAxMToyNzowNyIsIlRva2VuVHlwZSI6MSwiaXNzIjoibWluaW1heCJ9.Ge1ZnpFPUfXVdMini0P_qXbP_9VYwzXiffG9DsNQck4GtYEOs33LDeAiwrVsrrLZfvJ2icQZ4sRZS54wmPuWua_Dav6pYJty8ZtahmUX1IuhlUX5YErhhCRAIy3J1xB8FkLHLyylChuBHpkNz6O6BQLmPqmoa-cOYK9Qrc6IDeu8SX1iMzO9-MSkcWNvkvpCF2Pf9tekBVWNKMDK6IZoMEPbtkaPXdDyP6l0M0e2AlL_E0oM9exg3V-ohAi8OTPFyqM6dcd4TwF-b9DULxfIsRFw401mvIxcTDWa42u2LULewdATVRD2BthU65tuRqEiWeFWMvFlPj2soMze_QIiUA", base_url="https://api.minimaxi.chat/v1/text/chatcompletion_v2")
|
4 |
+
|
5 |
+
tools = [{
|
6 |
+
"type": "function",
|
7 |
+
"name": "get_weather",
|
8 |
+
"description": "Get current temperature for a given location.",
|
9 |
+
"parameters": {
|
10 |
+
"type": "object",
|
11 |
+
"properties": {
|
12 |
+
"location": {
|
13 |
+
"type": "string",
|
14 |
+
"description": "City and country e.g. Bogotá, Colombia"
|
15 |
+
}
|
16 |
+
},
|
17 |
+
"required": [
|
18 |
+
"location"
|
19 |
+
],
|
20 |
+
"additionalProperties": False
|
21 |
+
}
|
22 |
+
}]
|
23 |
+
|
24 |
+
response = client.chat.completions.create(
|
25 |
+
model="MiniMax-Text-01",
|
26 |
+
messages=[{"role": "user", "content": "What is the weather like in Paris today?"}],
|
27 |
+
)
|
28 |
+
|
29 |
+
print(response)
|