Paper and Model card show different models that were further pre-trained with Code Data

by Wusul - opened Jun 17

Jun 17

In the paper, its stated that "DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2
with additional 6 trillion tokens. ", but the model card says that " DeepSeek-Coder-V2 is further pre-trained from DeepSeek-Coder-V2-Base with 6 trillion tokens".

Wusul changed discussion status to closed Jun 18

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment