something-else
/

rocm-rwkv

Model card Files Files and versions Community

something-else commited on Mar 22

Commit

f80d2c7

•

1 Parent(s): a5c2924

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -17,4 +17,5 @@ tags:
 - rwkv-1epoch_N8_wrong_lr.pth: rwkv-v5-stp2-N8.pth : 3B rocm-rwkv model starting with the previous one (I think maybe I added more code or random multilangual, I don't remember) plus aditional 3 chunks of my mix of multi-language(ramdom) and code + 3 chunks of my dataset soup multilangual(only languages with character different to the english or latin-greek alphabet,e.g. Japanise, Cherokee, etc) + code + math+ instruct+ chain of thought). This model has 1 epoch (step) on the N8 dataset but with --lr_init 5e-7 --lr_final 5e-8. This pth has a loss of 1.978 for N8.
 - rwkv-v5-stp2-N8.pth : 3B rocm-rwkv model starting with the previous one + two epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.94 for N8.
 - rwkv-v5-stp5-N8.pth : 3B rocm-rwkv model starting with the previous but now with 5 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.90 for N8.
-- rwkv-v5-stp18-N8.pth : 3B rocm-rwkv model starting with the previous but now with 18 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.827 for N8 and 13.377 GTokens.

 - rwkv-1epoch_N8_wrong_lr.pth: rwkv-v5-stp2-N8.pth : 3B rocm-rwkv model starting with the previous one (I think maybe I added more code or random multilangual, I don't remember) plus aditional 3 chunks of my mix of multi-language(ramdom) and code + 3 chunks of my dataset soup multilangual(only languages with character different to the english or latin-greek alphabet,e.g. Japanise, Cherokee, etc) + code + math+ instruct+ chain of thought). This model has 1 epoch (step) on the N8 dataset but with --lr_init 5e-7 --lr_final 5e-8. This pth has a loss of 1.978 for N8.
 - rwkv-v5-stp2-N8.pth : 3B rocm-rwkv model starting with the previous one + two epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.94 for N8.
 - rwkv-v5-stp5-N8.pth : 3B rocm-rwkv model starting with the previous but now with 5 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.90 for N8.
+- rwkv-v5-stp18-N8.pth : 3B rocm-rwkv model starting with the previous but now with 18 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.827 for N8 and 13.377 GTokens.
+- - rwkv-v5-stp32-N8.pth : 3B rocm-rwkv model starting with the previous but now with 32 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.810 for N8 and 22.46 GTokens.