Model Overview
- Model Architecture: DeepSeek-R1
- Input: Text
- Output: Text
- Supported Hardware Microarchitecture: AMD MI350/MI355
- ROCm: 7.0
- Operating System(s): Linux
- Inference Engine: SGLang
- Model Optimizer: AMD-Quark
- Weight quantization: OCP MXFP4
- Activation quantization: OCP MXFP4
- Calibration Dataset: Pile
This model was built with deepseek-ai DeepSeek-R1 model by applying AMD-Quark for MXFP4 quantization.
Model Quantization
The model was quantized from deepseek-ai/DeepSeek-R1 using AMD-Quark. Both weights and activations were quantized to MXFP4 format, and the AutoSmoothQuant algorithm was applied to enhance accuracy.
Preprocessing requirement:
Before executing the quantization script below, the original FP8 model must first be dequantized to BFloat16. You can either perform the dequantization manually using this conversion script, or use the pre-converted BFloat16 model available at unsloth/DeepSeek-R1-BF16.
Quantization scripts:
cd Quark/examples/torch/language_modeling/llm_ptq/
python3 quantize_quark.py --model_dir $MODEL_DIR \
--quant_scheme w_mxfp4_a_mxfp4 \
--group_size 32 \
--num_calib_data 128 \
--exclude_layers "*self_attn*" "*mlp.gate.*" "*lm_head" \
--multi_gpu \
--quant_algo autosmoothquant \
--model_export hf_format \
--output_dir amd/DeepSeek-R1-MXFP4
Deployment
Use with SGLang
This model can be deployed efficiently using the SGLang backend.
Evaluation
The model was evaluated on AIME2024, GPQA Diamond, and GSM8K. Evaluation was conducted using the framework lm-evaluation-harness and the SGLang engine.
Accuracy
Benchmark | DeepSeek-R1 | DeepSeek-R1-MXFP4(this model) | Recovery |
AIME2024 | 78.00 | 76.00 | 97.44% |
GPQA Diamond | 68.89 | 68.18 | 98.97% |
GSM8K | 95.81 | 95.42 | 99.59% |
Reproduction
The results were obtained using the following commands:
AIME2024
# starting server
python3 -m sglang.launch_server \
--model amd/DeepSeek-R1-MXFP4 \
--tp 8 \
--trust-remote-code \
--n-share-experts-fusion 8 \
--disable-radix-cache
# evaluating
lm_eval --model local-completions \
--model_args model=amd/DeepSeek-R1-MXFP4,base_url=http://localhost:30000/v1/completions,num_concurrent=999999,timeout=999999,tokenized_requests=False,max_length=32000,temperature=0.6,top_p=0.95 \
--tasks aime24 \
--num_fewshot 0 \
--gen_kwargs "do_sample=True,temperature=0.6,top_p=0.95,max_tokens=32000" \
--batch_size auto \
--log_samples \
--output_path output_data/DeepSeek-R1-MXFP4
GPQA Diamond
lm_eval --model local-completions \
--model_args model=amd/DeepSeek-R1-MXFP4,base_url=http://localhost:30000/v1/completions,num_concurrent=999999,timeout=999999,tokenized_requests=False,max_length=32000,temperature=0.6,top_p=0.95 \
--tasks gpqa_diamond_cot_zeroshot \
--num_fewshot 0 \
--gen_kwargs "do_sample=True,temperature=0.6,top_p=0.95,max_tokens=32000,max_gen_toks=32000" \
--batch_size auto \
--log_samples \
--output_path output_data/DeepSeek-R1-MXFP4
GSM8K
lm_eval --model local-completions \
--model_args model=amd/DeepSeek-R1-MXFP4,base_url=http://localhost:30000/v1/completions,num_concurrent=999999,timeout=999999,tokenized_requests=False,max_length=8096 \
--tasks gsm8k \
--num_fewshot 5 \
--batch_size auto \
--log_samples \
--output_path output_data/DeepSeek-R1-MXFP4
License
Modifications Copyright(c) 2025 Advanced Micro Devices, Inc. All rights reserved.
- Downloads last month
- 1,054
Model tree for amd/DeepSeek-R1-MXFP4-Preview
Base model
deepseek-ai/DeepSeek-R1