Paper and Model card show different models that were further pre-trained with Code Data
#1
by
Wusul
- opened
In the paper, its stated that "DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2
with additional 6 trillion tokens. ", but the model card says that " DeepSeek-Coder-V2 is further pre-trained from DeepSeek-Coder-V2-Base with 6 trillion tokens".
Wusul
changed discussion status to
closed