Paper and Model card show different models that were further pre-trained with Code Data

#1
by Wusul - opened

In the paper, its stated that "DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2
with additional 6 trillion tokens. ", but the model card says that " DeepSeek-Coder-V2 is further pre-trained from DeepSeek-Coder-V2-Base with 6 trillion tokens".

Wusul changed discussion status to closed

Sign up or log in to comment