DeepSeek-R1-Distill-Qwen-7B-q4f16_ft-MLC
Model Configuration | |
---|---|
Source Model | deepseek-ai/DeepSeek-R1-Distill-Qwen-7B |
Inference API | MLC_LLM |
Quantization | q4f16_ft |
Model Type | qwen2 |
Vocab Size | 152064 |
Context Window Size | 131072 |
Prefill Chunk Size | 8192 |
Temperature | 0.6 |
Repetition Penalty | 1.0 |
top_p | 0.95 |
pad_token_id | 0 |
bos_token_id | 151646 |
eos_token_id | 151643 |
See jetson-ai-lab.com/models.html
for benchmarks, examples, and containers to deploy local serving and inference for these quantized models.