something-else commited on
Commit
a6f5f78
1 Parent(s): 65c1603

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -10,4 +10,6 @@ tags:
10
  3B rocm-rwkv pth record.
11
  - rwkv-final-chnk5.pth: 3B rocm-rwkv model trained with Slim pajama chunk1-5 and with a loss of 2.456.
12
  - rwkv-final-chnk17.pth: 3B rocm-rwkv model trained with Slim pajama chunk1-10 for the first epoch and an aditional training with chunk1-7 after the first epoch and with a loss of 2.281
13
- - rwkv-code39-16012024.pth: 3B rocm-rwkv model trained with Slim pajama chunk1-10 for the first epoch and an aditional training with chunk1-8 after the first epoch; plus a little bit of code. This pth has a loss of 1.174 for code alone and 2.26 for text.
 
 
 
10
  3B rocm-rwkv pth record.
11
  - rwkv-final-chnk5.pth: 3B rocm-rwkv model trained with Slim pajama chunk1-5 and with a loss of 2.456.
12
  - rwkv-final-chnk17.pth: 3B rocm-rwkv model trained with Slim pajama chunk1-10 for the first epoch and an aditional training with chunk1-7 after the first epoch and with a loss of 2.281
13
+ - rwkv-code39-16012024.pth: 3B rocm-rwkv model trained with Slim pajama chunk1-10 for the first epoch and an aditional training with chunk1-8 after the first epoch; plus a little bit of code. This pth has a loss of 1.174 for code alone and 2.26 for text.
14
+ - rwkv-HHMIX-63x1-47-29012024.pth: 3B rocm-rwkv model starting with rwkv-code39-16012024.pth plus a mix of multi-language and code. This model have a loss value of 2.065 for the code+multilingual dataset.
15
+ - rwkv-coder-63x1-104-29012024.pth: 3B rocm-rwkv model starting with rwkv-HHMIX-63x1-47-29012024.pth plus more code (71.21 Gtokens of code).