something-else commited on
Commit
ec31303
·
verified ·
1 Parent(s): 1963f8e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -25,6 +25,7 @@ tags:
25
  - rwkv-v5-stp118-N8.pth : 3B rocm-rwkv model starting with the previous but now with 118 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.750 for N8 and 79.508 GTokens.
26
  - rwkv-v5-stp146-N8.pth : 3B rocm-rwkv model starting with the previous but now with 146 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.758 for N8 and 97.982 GTokens.
27
  - rwkv-v5-final-N8.pth : 3B rocm-rwkv model starting with the previous but now with the full N8 dataset epoch with --lr_init 3e-8 --lr_final 1e-8 This pth has a loss of 1.73 for the full N8 dataset with 106.098327552 GTokens.
 
28
 
29
 
30
  7B rocm-rwkv pth record: I called this model Tlanuwa since I added an extra training focusing on cherokee after each run.
 
25
  - rwkv-v5-stp118-N8.pth : 3B rocm-rwkv model starting with the previous but now with 118 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.750 for N8 and 79.508 GTokens.
26
  - rwkv-v5-stp146-N8.pth : 3B rocm-rwkv model starting with the previous but now with 146 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.758 for N8 and 97.982 GTokens.
27
  - rwkv-v5-final-N8.pth : 3B rocm-rwkv model starting with the previous but now with the full N8 dataset epoch with --lr_init 3e-8 --lr_final 1e-8 This pth has a loss of 1.73 for the full N8 dataset with 106.098327552 GTokens.
28
+ - rwkv-3B-stp634-N8-3.pth : 3B rocm-rwkv model starting with the previous but now with the 104 GTokens of the N8-3 dataset with ctxt=4k. This pth has a loss of 1.92 for the N8-3 dataset.
29
 
30
 
31
  7B rocm-rwkv pth record: I called this model Tlanuwa since I added an extra training focusing on cherokee after each run.