deepseek-ai/DeepSeek-R1

Delete Config.json

#193 opened about 12 hours ago by

jana0010

Update README.md

#192 opened 1 day ago by

caraanchoa

为助手回答添加 <think>\n> 标签，确保一致性

#191 opened 1 day ago by

REN0430

fix for transformers 4.49 compatibility

#189 opened 1 day ago by

katuni4ka

MLLM交流群

#188 opened 2 days ago by

YLHX

Question about experts select

#186 opened 3 days ago by

waynebian

Hardware Requirements to run the original model - 671B params

2

#185 opened 3 days ago by

EdilCamil

Holding paper in hand

#184 opened 4 days ago by

Loveyl

Update config.json

#182 opened 4 days ago by

Empolean2640

Regression in Reasoning Tag Output - Missing <think> in Model Responses

1

#181 opened 5 days ago by

divinerapier

Delete model.safetensors.index.json

#180 opened 5 days ago by

Huggingfaceliaj

Unknown quantization type, got fp8

#179 opened 6 days ago by

DenisFavaCerchiaro

如何取消/省略<think></think>过程。

2

#178 opened 7 days ago by

yech520

Request: DOI

#177 opened 7 days ago by

Tamwyn

Request: DOI

#176 opened 8 days ago by

saathwik

Request: DOI

#175 opened 9 days ago by

Paulabad

Draft model as accelerator for DeepSeek-R1?

4

#174 opened 9 days ago by

inputout

Does R1 support long context (> 4K)?

#172 opened 10 days ago by

ghostplant

Deploying production ready service with Unsloth GGUF quants on your AWS account. (4 x L40S)

8

#171 opened 10 days ago by

samagra-tensorfuse

是否可以关注Perplexity推出的“r1-1776”模型？

4

#170 opened 10 days ago by

yanyihan

Just crossed 10,000 likes!

1

#169 opened 11 days ago by

clem

mac上面无法下载flash_attn

#168 opened 12 days ago by

earlyIsLate

Can this model be used for commercial use?

2

#167 opened 12 days ago by

henrycwf

90+ tokens per second for MI300x8 using batch_size = 1

1

#166 opened 13 days ago by

ghostplant

RytryR1

#165 opened 14 days ago by

Rocka01

"aha moment" comment deleted by Perplexity (recovered)

3

#164 opened 14 days ago by

FalconNet

输出乱码

1

#163 opened 14 days ago by

cell22

'num_hidden_layers': 61, but layer 62 has weights.

#162 opened 14 days ago by

xinhe

Upload GTG Breaking every Limit

#161 opened 14 days ago by

GTGenesis

support prefix complete

3

#158 opened 16 days ago by

HuggineAllen

Create app.py

#157 opened 18 days ago by

SpaceAgeRobotics

Create 1

#156 opened 18 days ago by

madevii

Brokersponsor

#155 opened 18 days ago by

Brokersponsor

Update README.md

#154 opened 18 days ago by

egegvner

Upload IMG_4530.png

#152 opened 19 days ago by

Noemie202586

Upload IMG_1745.JPG

#151 opened 19 days ago by

Ladib

Create Clara

1

#150 opened 19 days ago by

Clblinks

If I understand correctly, evaluating MATH-500 requires 64*500 model calls?

1

#149 opened 21 days ago by

Rorschaaaach

Request: DOI

#148 opened 21 days ago by

Tarush-Appreciate

Update README.md

#147 opened 21 days ago by

tekno-power

Update README.md

#146 opened 22 days ago by

Ekimnedops6969

Adding <think>\n after chat template will cause vllm to not return reasoning_content (null) when reasoning

6

#144 opened 22 days ago by

kebeliu

Update README.md

1

#143 opened 22 days ago by

MuhammadEhsan

Request for Information on Purchasing Reasoning API Key

1

#142 opened 23 days ago by

brahamaandai

ssss

1

#140 opened 24 days ago by

DZGT

Update model_max_length in tokenizer_config.json

#139 opened 24 days ago by

kkokkie2360

Host of the model

3

#138 opened 24 days ago by

henrycwf

Lite version for DeepSeek-R1?

1

#137 opened 25 days ago by

haili-tian

[Bug] assert not self.training

3

#136 opened 25 days ago by

Gaie

Upload IMG_0253.HEIC

#134 opened 25 days ago by

rynty

Delete Config.json

Update README.md

为助手回答添加 <think>\n> 标签，确保一致性

fix for transformers 4.49 compatibility

MLLM交流群

Question about experts select

Hardware Requirements to run the original model - 671B params

Holding paper in hand

Update config.json

Regression in Reasoning Tag Output - Missing <think> in Model Responses

Delete model.safetensors.index.json

Unknown quantization type, got fp8

如何取消/省略<think></think>过程。

Request: DOI

Request: DOI

Request: DOI

Draft model as accelerator for DeepSeek-R1?

Does R1 support long context (> 4K)?

Deploying production ready service with Unsloth GGUF quants on your AWS account. (4 x L40S)

是否可以关注Perplexity推出的“r1-1776”模型？

Just crossed 10,000 likes!

mac上面无法下载flash_attn

Can this model be used for commercial use?

90+ tokens per second for MI300x8 using batch_size = 1

RytryR1

"aha moment" comment deleted by Perplexity (recovered)

输出乱码

'num_hidden_​​layers': 61, but layer 62 has weights.

Upload GTG Breaking every Limit

support prefix complete

Create app.py

Create 1

Brokersponsor

Update README.md

Upload IMG_4530.png

Upload IMG_1745.JPG

Create Clara

If I understand correctly, evaluating MATH-500 requires 64*500 model calls?

Request: DOI

Update README.md

Update README.md

Adding <think>\n after chat template will cause vllm to not return reasoning_content (null) when reasoning

Update README.md

Request for Information on Purchasing Reasoning API Key

ssss

Update model_max_length in tokenizer_config.json

Host of the model

Lite version for DeepSeek-R1?

[Bug] assert not self.training

Upload IMG_0253.HEIC

'num_hidden_layers': 61, but layer 62 has weights.