dusty-nv's picture
Upload folder using huggingface_hub
b4de9fa verified

DeepSeek-R1-Distill-Qwen-7B-q4f16_ft-MLC

Model Configuration
Source Model deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
Inference API MLC_LLM
Quantization q4f16_ft
Model Type qwen2
Vocab Size 152064
Context Window Size 131072
Prefill Chunk Size 8192
Temperature 0.6
Repetition Penalty 1.0
top_p 0.95
pad_token_id 0
bos_token_id 151646
eos_token_id 151643

See jetson-ai-lab.com/models.html for benchmarks, examples, and containers to deploy local serving and inference for these quantized models.