qingjun commited on
Commit
a042771
·
1 Parent(s): b81976d

Update README.md: fix the heading for the Function Calling section, add a detailed introduction to function calling, and correct the link to the vLLM deployment guide. Adjust the section numbering to reflect the new content.

Browse files
Files changed (2) hide show
  1. README.md +14 -5
  2. test.py +29 -0
README.md CHANGED
@@ -60,7 +60,7 @@ pipeline_tag: image-text-to-text
60
  # MiniMax-VL-01
61
 
62
  ## 1. Introduction
63
- We are delighted to introduce our **MiniMax-VL-01** model. It adopts the ViT-MLP-LLM framework, which is a commonly used technique in the field of multimodal large language models. The model is initialized and trained with three key parts: a 303-million-parameter Vision Transformer (ViT) for visual encoding, a randomly initialized two-layer MLP projector for image adaptation, and the MiniMax-Text-01 as the base LLM.
64
  MiniMax-VL-01 has a notable dynamic resolution feature. Input images are resized per a pre-set grid, with resolutions from 336×336 to 2016×2016, keeping a 336×336 thumbnail. The resized images are split into non-overlapping patches of the same size. These patches and the thumbnail are encoded separately and then combined for a full image representation.
65
  The training data for MiniMax-VL-01 consists of caption, description, and instruction data. The Vision Transformer (ViT) is trained on 694 million image-caption pairs from scratch. Across four distinct stages of the training pipeline, a total of 512 billion tokens are processed, leveraging this vast amount of data to endow the model with strong capabilities.
66
  Finally, MiniMax-VL-01 has reached top-level performance on multimodal leaderboards, demonstrating its edge and dependability in complex multimodal tasks.
@@ -190,9 +190,18 @@ For production deployment, we recommend using [vLLM](https://docs.vllm.ai/en/lat
190
  ⚡ Efficient and intelligent memory management
191
  📦 Powerful batch request processing capability
192
  ⚙️ Deeply optimized underlying performance
193
- For detailed deployment instructions, please refer to our [vLLM Deployment Guide](https://github.com/MiniMax-AI/MiniMax-01/blob/main/docs/vllm_deployment_guild.md).
194
 
195
- # 5. Citation
 
 
 
 
 
 
 
 
 
196
 
197
  ```
198
  @misc{minimax2025minimax01scalingfoundationmodels,
@@ -206,9 +215,9 @@ For detailed deployment instructions, please refer to our [vLLM Deployment Guide
206
  }
207
  ```
208
 
209
- ## 5. Chatbot & API
210
  For general use and evaluation, we provide a [Chatbot](https://www.hailuo.ai/) with online search capabilities and the [online API](https://www.minimax.io/platform) for developers. For general use and evaluation, we provide the [MiniMax MCP Server](https://github.com/MiniMax-AI/MiniMax-MCP) with video generation, image generation, speech synthesis, and voice cloning for developers.
211
 
212
 
213
- ## 6. Contact Us
214
  Contact us at [[email protected]](mailto:[email protected]).
 
60
  # MiniMax-VL-01
61
 
62
  ## 1. Introduction
63
+ We are delighted to introduce our **MiniMax-VL-01** model. It adopts the "ViT-MLP-LLM" framework, which is a commonly used technique in the field of multimodal large language models. The model is initialized and trained with three key parts: a 303-million-parameter Vision Transformer (ViT) for visual encoding, a randomly initialized two-layer MLP projector for image adaptation, and the MiniMax-Text-01 as the base LLM.
64
  MiniMax-VL-01 has a notable dynamic resolution feature. Input images are resized per a pre-set grid, with resolutions from 336×336 to 2016×2016, keeping a 336×336 thumbnail. The resized images are split into non-overlapping patches of the same size. These patches and the thumbnail are encoded separately and then combined for a full image representation.
65
  The training data for MiniMax-VL-01 consists of caption, description, and instruction data. The Vision Transformer (ViT) is trained on 694 million image-caption pairs from scratch. Across four distinct stages of the training pipeline, a total of 512 billion tokens are processed, leveraging this vast amount of data to endow the model with strong capabilities.
66
  Finally, MiniMax-VL-01 has reached top-level performance on multimodal leaderboards, demonstrating its edge and dependability in complex multimodal tasks.
 
190
  ⚡ Efficient and intelligent memory management
191
  📦 Powerful batch request processing capability
192
  ⚙️ Deeply optimized underlying performance
193
+ For detailed deployment instructions, please refer to our [vLLM Deployment Guide](https://github.com/MiniMax-AI/MiniMax-01/blob/main/docs/vllm_deployment_guide.md).
194
 
195
+ ## 5. Function Calling
196
+ MiniMax-VL-01 supports Function Calling capability, enabling the model to intelligently identify when external functions need to be called and output parameters in structured JSON format. With Function Calling, you can:
197
+
198
+ - Let the model recognize implicit function call needs in user requests
199
+ - Receive structured parameter outputs for seamless application integration
200
+ - Support various complex parameter types, including nested objects and arrays
201
+
202
+ Function Calling supports standard OpenAI-compatible format definitions and integrates seamlessly with the Transformers library. For detailed usage instructions, please refer to our [Function Call Guide](./MiniMax-VL-01_Function_Call_Guide.md) or [Chinese Guide](./MiniMax-VL-01_Function_Call_Guide_CN.md).
203
+
204
+ ## 6. Citation
205
 
206
  ```
207
  @misc{minimax2025minimax01scalingfoundationmodels,
 
215
  }
216
  ```
217
 
218
+ ## 7. Chatbot & API
219
  For general use and evaluation, we provide a [Chatbot](https://www.hailuo.ai/) with online search capabilities and the [online API](https://www.minimax.io/platform) for developers. For general use and evaluation, we provide the [MiniMax MCP Server](https://github.com/MiniMax-AI/MiniMax-MCP) with video generation, image generation, speech synthesis, and voice cloning for developers.
220
 
221
 
222
+ ## 8. Contact Us
223
  Contact us at [[email protected]](mailto:[email protected]).
test.py ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from openai import OpenAI
2
+
3
+ client = OpenAI(api_key="eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJHcm91cE5hbWUiOiJtbyBuaSIsIlVzZXJOYW1lIjoibW8gbmkiLCJBY2NvdW50IjoiIiwiU3ViamVjdElEIjoiMTg3NjIwMDY0ODA2NDYzNTI0MiIsIlBob25lIjoiIiwiR3JvdXBJRCI6IjE4NzYyMDA2NDgwNjA0NDA5MzgiLCJQYWdlTmFtZSI6IiIsIk1haWwiOiJuaW1vQHN1YnN1cC52aXAiLCJDcmVhdGVUaW1lIjoiMjAyNS0wMS0wNyAxMToyNzowNyIsIlRva2VuVHlwZSI6MSwiaXNzIjoibWluaW1heCJ9.Ge1ZnpFPUfXVdMini0P_qXbP_9VYwzXiffG9DsNQck4GtYEOs33LDeAiwrVsrrLZfvJ2icQZ4sRZS54wmPuWua_Dav6pYJty8ZtahmUX1IuhlUX5YErhhCRAIy3J1xB8FkLHLyylChuBHpkNz6O6BQLmPqmoa-cOYK9Qrc6IDeu8SX1iMzO9-MSkcWNvkvpCF2Pf9tekBVWNKMDK6IZoMEPbtkaPXdDyP6l0M0e2AlL_E0oM9exg3V-ohAi8OTPFyqM6dcd4TwF-b9DULxfIsRFw401mvIxcTDWa42u2LULewdATVRD2BthU65tuRqEiWeFWMvFlPj2soMze_QIiUA", base_url="https://api.minimaxi.chat/v1/text/chatcompletion_v2")
4
+
5
+ tools = [{
6
+ "type": "function",
7
+ "name": "get_weather",
8
+ "description": "Get current temperature for a given location.",
9
+ "parameters": {
10
+ "type": "object",
11
+ "properties": {
12
+ "location": {
13
+ "type": "string",
14
+ "description": "City and country e.g. Bogotá, Colombia"
15
+ }
16
+ },
17
+ "required": [
18
+ "location"
19
+ ],
20
+ "additionalProperties": False
21
+ }
22
+ }]
23
+
24
+ response = client.chat.completions.create(
25
+ model="MiniMax-Text-01",
26
+ messages=[{"role": "user", "content": "What is the weather like in Paris today?"}],
27
+ )
28
+
29
+ print(response)