manaestras commited on
Commit
10e78a4
·
verified ·
1 Parent(s): a81b348

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -21
README.md CHANGED
@@ -1,16 +1,3 @@
1
- ---
2
- license: other
3
- license_name: tencent-hunyuan-a13b
4
- license_link: https://github.com/Tencent-Hunyuan/Hunyuan-A13B/blob/main/LICENSE
5
- ---
6
-
7
- <p align="left">
8
- <a href="README_CN.md">中文</a>&nbsp | English</a>
9
- </p>
10
- <br><br>
11
-
12
-
13
-
14
  <p align="center">
15
  <img src="https://dscache.tencent-cloud.cn/upload/uploader/hunyuan-64b418fd052c033b228e04bc77bbc4b54fd7f5bc.png" width="400"/> <br>
16
  </p><p></p>
@@ -24,8 +11,10 @@ license_link: https://github.com/Tencent-Hunyuan/Hunyuan-A13B/blob/main/LICENSE
24
  <img src="https://avatars.githubusercontent.com/u/109945100?s=200&v=4" width="16"/>&nbsp;<a href="https://modelscope.cn/models/Tencent-Hunyuan/Hunyuan-A13B-Instruct"><b>ModelScope</b></a>
25
  </p>
26
 
 
27
  <p align="center">
28
- <a href="https://github.com/Tencent/Hunyuan-A13B"><b>GITHUB</b></a>
 
29
  </p>
30
 
31
 
@@ -51,7 +40,7 @@ As a powerful yet computationally efficient large model, Hunyuan-A13B is an idea
51
  &nbsp;
52
 
53
  ## Related News
54
- * 2025.6.27 We have open-sourced **Hunyuan-A13B-Pretrain** , **Hunyuan-A13B-Instruct** , **Hunyuan-A13B-Instruct-FP8** , **Hunyuan-80B-A13B-Instruct-GPTQ-Int4** on Hugging Face.
55
  <br>
56
 
57
 
@@ -131,10 +120,75 @@ print(f"thinking_content:{think_content}\n\n")
131
  print(f"answer_content:{answer_content}\n\n")
132
  ```
133
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
134
 
135
  ## Deployment
136
 
137
- For deployment, you can use frameworks such as **vLLM**, **SGLang**, or **TensorRT-LLM** to serve the model and create an OpenAI-compatible API endpoint.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
138
 
139
 
140
  ### vllm
@@ -144,11 +198,10 @@ We provide a pre-built Docker image containing vLLM 0.8.5 with full support for
144
 
145
  - To get started:
146
 
147
- https://hub.docker.com/r/hunyuaninfer/hunyuan-large/tags
148
-
149
  ```
 
 
150
  docker pull hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm
151
-
152
  ```
153
 
154
  - Download Model file:
@@ -190,7 +243,9 @@ To get started:
190
  - Pull the Docker image
191
 
192
  ```
193
- docker pull tiacc-test.tencentcloudcr.com/tiacc/sglang:0.4.7
 
 
194
  ```
195
 
196
  - Start the API server:
@@ -200,7 +255,7 @@ docker run --gpus all \
200
  --shm-size 32g \
201
  -p 30000:30000 \
202
  --ipc=host \
203
- tiacc-test.tencentcloudcr.com/tiacc/sglang:0.4.7 \
204
  -m sglang.launch_server --model-path hunyuan/huanyuan_A13B --tp 4 --trust-remote-code --host 0.0.0.0 --port 30000
205
  ```
206
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  <p align="center">
2
  <img src="https://dscache.tencent-cloud.cn/upload/uploader/hunyuan-64b418fd052c033b228e04bc77bbc4b54fd7f5bc.png" width="400"/> <br>
3
  </p><p></p>
 
11
  <img src="https://avatars.githubusercontent.com/u/109945100?s=200&v=4" width="16"/>&nbsp;<a href="https://modelscope.cn/models/Tencent-Hunyuan/Hunyuan-A13B-Instruct"><b>ModelScope</b></a>
12
  </p>
13
 
14
+
15
  <p align="center">
16
+ <a href="https://github.com/Tencent/Hunyuan-A13B"><b>GITHUB</b></a> |
17
+ <a href="https://github.com/Tencent-Hunyuan/Hunyuan-A13B/blob/main/LICENSE"><b>LICENSE</b></a>
18
  </p>
19
 
20
 
 
40
  &nbsp;
41
 
42
  ## Related News
43
+ * 2025.6.27 We have open-sourced **Hunyuan-A13B-Pretrain** , **Hunyuan-A13B-Instruct** , **Hunyuan-A13B-Instruct-FP8** , **Hunyuan-A13B-Instruct-GPTQ-Int4** on Hugging Face.
44
  <br>
45
 
46
 
 
120
  print(f"answer_content:{answer_content}\n\n")
121
  ```
122
 
123
+ ## Quantitative Compression
124
+ We used our own `AngleSlim` compression tool to produce FP8 and INT4 quantization models. `AngleSlim` compression tool is expected to be open source in early July, which will support one-click quantization and compression of large models, please look forward to it, and you can download our quantization models directly for deployment testing now.
125
+
126
+ ### FP8 Quantization
127
+ We use FP8-static quantization, FP8 quantization adopts 8-bit floating point format, through a small amount of calibration data (without training) to pre-determine the quantization scale, the model weights and activation values will be converted to FP8 format, to improve the inference efficiency and reduce the deployment threshold. We you can use AngleSlim quantization, you can also directly download our quantization completed open source model to use [Hunyuan-A13B-Instruct-FP8](https://huggingface.co/tencent/Hunyuan-A13B-Instruct-FP8).
128
+
129
+ #### FP8 Benchmark
130
+ This subsection describes the Benchmark metrics for the Hunyuan-80B-A13B-Instruct-FP8 quantitative model.
131
+
132
+ | Bench | Hunyuan-A13B-Instruct | Hunyuan-A13B-Instruct-FP8 |
133
+ |:---------:|:---------------------:|:-------------------------:|
134
+ | AIME 2024 | 87.3 | 86.7 |
135
+ | Gsm8k | 94.39 | 94.01 |
136
+ | BBH | 89.1 | 88.34 |
137
+ | DROP | 91.1 | 91.1 |
138
+
139
+ ### Int4 Quantization
140
+ We use the GPTQ algorithm to achieve W4A16 quantization, which processes the model weights layer by layer, uses a small amount of calibration data to minimize the reconfiguration error of the quantized weights, and adjusts the weights layer by layer by the optimization process of approximating the Hessian inverse matrix. The process eliminates the need to retrain the model and requires only a small amount of calibration data to quantize the weights, improving inference efficiency and lowering the deployment threshold. You can use `AngleSlim` quantization, you can also directly download our quantization completed open source model to use [Hunyuan-A13B-Instruct-Int4](https://huggingface.co/tencent/Hunyuan-A13B-Instruct-GPTQ-Int4).
141
+
142
+ #### Int4 Benchmark
143
+ This subsection describes the Benchmark metrics for the Hunyuan-80B-A13B-Instruct-GPTQ-Int4 quantitative model.
144
+
145
+ | Bench | Hunyuan-A13B-Instruct | Hunyuan-A13B-Instruct-GPTQ-Int4 |
146
+ |:--------------:|:---------------------:|:-------------------------------:|
147
+ | OlympiadBench | 82.7 | 84.0 |
148
+ | AIME 2024 | 87.3 | 86.7 |
149
+ | Gsm8k | 94.39 | 94.24 |
150
+ | BBH | 88.34 | 87.91 |
151
+ | DROP | 91.12 | 91.05 |
152
+
153
 
154
  ## Deployment
155
 
156
+ For deployment, you can use frameworks such as **TensorRT-LLM**, **vLLM**, or **SGLang** to serve the model and create an OpenAI-compatible API endpoint.
157
+
158
+ image: https://hub.docker.com/r/hunyuaninfer/hunyuan-a13b/tags
159
+
160
+
161
+ ### TensorRT-LLM
162
+
163
+ #### Docker Image
164
+
165
+ We provide a pre-built Docker image based on the latest version of TensorRT-LLM.
166
+
167
+ - To get started:
168
+
169
+ https://hub.docker.com/r/hunyuaninfer/hunyuan-large/tags
170
+
171
+ ```
172
+ docker pull hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-trtllm
173
+ ```
174
+
175
+ - Start the API server:
176
+
177
+ ```
178
+ docker run --name hunyuanLLM_infer --rm -it --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --gpus=all hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-trtllm
179
+ ```
180
+ ```
181
+ trtllm-serve \
182
+ /path/to/HunYuan-moe-A13B \
183
+ --host localhost \
184
+ --port 8000 \
185
+ --backend pytorch \
186
+ --max_batch_size 128 \
187
+ --max_num_tokens 16384 \
188
+ --tp_size 2 \
189
+ --kv_cache_free_gpu_memory_fraction 0.95 \
190
+ --extra_llm_api_options /path/to/extra-llm-api-config.yml
191
+ ```
192
 
193
 
194
  ### vllm
 
198
 
199
  - To get started:
200
 
 
 
201
  ```
202
+ docker pull docker.cnb.cool/tencent/hunyuan/hunyuan-a13b:hunyuan-moe-A13B-vllm
203
+ or
204
  docker pull hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm
 
205
  ```
206
 
207
  - Download Model file:
 
243
  - Pull the Docker image
244
 
245
  ```
246
+ docker pull docker.cnb.cool/tencent/hunyuan/hunyuan-a13b:hunyuan-moe-A13B-sglang
247
+ or
248
+ docker pull hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-sglang
249
  ```
250
 
251
  - Start the API server:
 
255
  --shm-size 32g \
256
  -p 30000:30000 \
257
  --ipc=host \
258
+ docker.cnb.cool/tencent/hunyuan/hunyuan-a13b:hunyuan-moe-A13B-sglang \
259
  -m sglang.launch_server --model-path hunyuan/huanyuan_A13B --tp 4 --trust-remote-code --host 0.0.0.0 --port 30000
260
  ```
261