something-else
commited on
Commit
•
4f49839
1
Parent(s):
728bc51
Update README.md
Browse files
README.md
CHANGED
@@ -15,4 +15,5 @@ tags:
|
|
15 |
- rwkv-coder-63x1-104-29012024.pth: 3B rocm-rwkv model starting with rwkv-HHMIX-63x1-47-29012024.pth plus more code (71.21 Gtokens of code). This model has a loss value of 1.090 for the code dataset.
|
16 |
- rwkv-final_HHMIX_chuk3.pth: 3B rocm-rwkv model starting with rwkv-coder-63x1-104-29012024.pth plus a mix of multi-language and code. This model has a loss value of 1.836 for the code+multilingual dataset.
|
17 |
- rwkv-1epoch_N8_wrong_lr.pth: rwkv-v5-stp2-N8.pth : 3B rocm-rwkv model starting with the previous one (I think maybe I added more code or random multilangual, I don't remember) plus aditional 3 chunks of my mix of multi-language(ramdom) and code + 3 chunks of my dataset soup multilangual(only languages with character different to the english or latin-greek alphabet,e.g. Japanise, Cherokee, etc) + code + math+ instruct+ chain of thought). This model has 1 epoch (step) on the N8 dataset but with --lr_init 5e-7 --lr_final 5e-8. This pth has a loss of 1.978 for N8.
|
18 |
-
- rwkv-v5-stp2-N8.pth : 3B rocm-rwkv model starting with the previous one + two epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.94 for N8.
|
|
|
|
15 |
- rwkv-coder-63x1-104-29012024.pth: 3B rocm-rwkv model starting with rwkv-HHMIX-63x1-47-29012024.pth plus more code (71.21 Gtokens of code). This model has a loss value of 1.090 for the code dataset.
|
16 |
- rwkv-final_HHMIX_chuk3.pth: 3B rocm-rwkv model starting with rwkv-coder-63x1-104-29012024.pth plus a mix of multi-language and code. This model has a loss value of 1.836 for the code+multilingual dataset.
|
17 |
- rwkv-1epoch_N8_wrong_lr.pth: rwkv-v5-stp2-N8.pth : 3B rocm-rwkv model starting with the previous one (I think maybe I added more code or random multilangual, I don't remember) plus aditional 3 chunks of my mix of multi-language(ramdom) and code + 3 chunks of my dataset soup multilangual(only languages with character different to the english or latin-greek alphabet,e.g. Japanise, Cherokee, etc) + code + math+ instruct+ chain of thought). This model has 1 epoch (step) on the N8 dataset but with --lr_init 5e-7 --lr_final 5e-8. This pth has a loss of 1.978 for N8.
|
18 |
+
- rwkv-v5-stp2-N8.pth : 3B rocm-rwkv model starting with the previous one + two epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.94 for N8.
|
19 |
+
- rwkv-v5-stp5-N8.pth : 3B rocm-rwkv model starting with the previous but now with 5 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.90 for N8.
|