Delete Config.json
#193 opened about 12 hours ago
by
jana0010
Update README.md
#192 opened 1 day ago
by
caraanchoa
为助手回答添加 <think>\n> 标签,确保一致性
#191 opened 1 day ago
by
REN0430
fix for transformers 4.49 compatibility
#189 opened 1 day ago
by
katuni4ka

Question about experts select
#186 opened 3 days ago
by
waynebian
Hardware Requirements to run the original model - 671B params
2
#185 opened 3 days ago
by
EdilCamil

Holding paper in hand
#184 opened 4 days ago
by
Loveyl
Update config.json
#182 opened 4 days ago
by
Empolean2640
Regression in Reasoning Tag Output - Missing <think> in Model Responses
1
#181 opened 5 days ago
by
divinerapier
Delete model.safetensors.index.json
#180 opened 5 days ago
by
Huggingfaceliaj
Unknown quantization type, got fp8
#179 opened 6 days ago
by
DenisFavaCerchiaro
如何取消/省略<think></think>过程。
2
#178 opened 7 days ago
by
yech520
Request: DOI
#177 opened 7 days ago
by
Tamwyn
Request: DOI
#176 opened 8 days ago
by
saathwik
Request: DOI
#175 opened 9 days ago
by
Paulabad
Draft model as accelerator for DeepSeek-R1?
4
#174 opened 9 days ago
by
inputout

Does R1 support long context (> 4K)?
#172 opened 10 days ago
by
ghostplant
Deploying production ready service with Unsloth GGUF quants on your AWS account. (4 x L40S)
8
#171 opened 10 days ago
by
samagra-tensorfuse
是否可以关注Perplexity推出的“r1-1776”模型?
4
#170 opened 10 days ago
by
yanyihan
Just crossed 10,000 likes!
1
#169 opened 11 days ago
by
clem

mac上面无法下载flash_attn
#168 opened 12 days ago
by
earlyIsLate
Can this model be used for commercial use?
2
#167 opened 12 days ago
by
henrycwf

90+ tokens per second for MI300x8 using batch_size = 1
1
#166 opened 13 days ago
by
ghostplant
"aha moment" comment deleted by Perplexity (recovered)
3
#164 opened 14 days ago
by
FalconNet
'num_hidden_layers': 61, but layer 62 has weights.
#162 opened 14 days ago
by
xinhe
Upload GTG Breaking every Limit
#161 opened 14 days ago
by
GTGenesis
support prefix complete
3
#158 opened 16 days ago
by
HuggineAllen
Create app.py
#157 opened 18 days ago
by
SpaceAgeRobotics

Brokersponsor
#155 opened 18 days ago
by
Brokersponsor

Update README.md
#154 opened 18 days ago
by
egegvner
Upload IMG_4530.png
#152 opened 19 days ago
by
Noemie202586
Upload IMG_1745.JPG
#151 opened 19 days ago
by
Ladib
Create Clara
1
#150 opened 19 days ago
by
Clblinks
If I understand correctly, evaluating MATH-500 requires 64*500 model calls?
1
#149 opened 21 days ago
by
Rorschaaaach
Request: DOI
#148 opened 21 days ago
by
Tarush-Appreciate
Update README.md
#147 opened 21 days ago
by
tekno-power
Update README.md
#146 opened 22 days ago
by
Ekimnedops6969
Adding <think>\n after chat template will cause vllm to not return reasoning_content (null) when reasoning
6
#144 opened 22 days ago
by
kebeliu
Update README.md
1
#143 opened 22 days ago
by
MuhammadEhsan

Request for Information on Purchasing Reasoning API Key
1
#142 opened 23 days ago
by
brahamaandai

Update model_max_length in tokenizer_config.json
#139 opened 24 days ago
by
kkokkie2360
Host of the model
3
#138 opened 24 days ago
by
henrycwf

Lite version for DeepSeek-R1?
1
#137 opened 25 days ago
by
haili-tian
[Bug] assert not self.training
3
#136 opened 25 days ago
by
Gaie

Upload IMG_0253.HEIC
#134 opened 25 days ago
by
rynty