something-else
commited on
Commit
•
24a33b7
1
Parent(s):
b20be4e
Update README.md
Browse files
README.md
CHANGED
@@ -25,7 +25,8 @@ tags:
|
|
25 |
- rwkv-v5-stp118-N8.pth : 3B rocm-rwkv model starting with the previous but now with 118 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.750 for N8 and 79.508 GTokens.
|
26 |
- rwkv-v5-stp146-N8.pth : 3B rocm-rwkv model starting with the previous but now with 146 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.758 for N8 and 97.982 GTokens.
|
27 |
- rwkv-v5-final-N8.pth : 3B rocm-rwkv model starting with the previous but now with the full N8 dataset epoch with --lr_init 3e-8 --lr_final 1e-8 This pth has a loss of 1.73 for the full N8 dataset with 106.098327552 GTokens.
|
28 |
-
- rwkv-3B-stp634-N8-3.pth : 3B rocm-rwkv model starting with the previous but now with the 104 GTokens of the N8-3 dataset with ctxt=4k. This pth has a loss of 1.92 for the N8-3 dataset.
|
|
|
29 |
|
30 |
|
31 |
7B rocm-rwkv pth record: I called this model Tlanuwa since I added an extra training focusing on cherokee after each run.
|
@@ -42,3 +43,4 @@ tags:
|
|
42 |
- rwkv-9Q-4k-stp248.pth: Using rwkv-9Q-1k-stp706-N8-0.pth I added 2048 new steps with 40.66 Gtokens with a loss of 1.717 Nathan-0 datase and Ctx=4096.
|
43 |
- rwkv-9Q-16k-step6-0-4.pth: Using rwkv-9Q-4k-stp248.pth I added N-0 and N-8 and a Ctx=16384 loss=1.65. This model looks that can chat better.
|
44 |
- rwkv-9Q-step607-N8-3.pth: Using rwkv-9Q-16k-step6-0-4.pth I add 100G tokens of N8-3.
|
|
|
|
25 |
- rwkv-v5-stp118-N8.pth : 3B rocm-rwkv model starting with the previous but now with 118 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.750 for N8 and 79.508 GTokens.
|
26 |
- rwkv-v5-stp146-N8.pth : 3B rocm-rwkv model starting with the previous but now with 146 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.758 for N8 and 97.982 GTokens.
|
27 |
- rwkv-v5-final-N8.pth : 3B rocm-rwkv model starting with the previous but now with the full N8 dataset epoch with --lr_init 3e-8 --lr_final 1e-8 This pth has a loss of 1.73 for the full N8 dataset with 106.098327552 GTokens.
|
28 |
+
- rwkv-3B-stp634-N8-3.pth : 3B rocm-rwkv model starting with the previous but now with the 104 GTokens of the N8-3 dataset with ctxt=4k. This pth has a loss of 1.92 for the N8-3 dataset.
|
29 |
+
- rwkv-3B-4K-stp802-N8-3.pth: Using rwkv-3B-stp634-N8-3.pth I added 7 more GTokens of N8-3.
|
30 |
|
31 |
|
32 |
7B rocm-rwkv pth record: I called this model Tlanuwa since I added an extra training focusing on cherokee after each run.
|
|
|
43 |
- rwkv-9Q-4k-stp248.pth: Using rwkv-9Q-1k-stp706-N8-0.pth I added 2048 new steps with 40.66 Gtokens with a loss of 1.717 Nathan-0 datase and Ctx=4096.
|
44 |
- rwkv-9Q-16k-step6-0-4.pth: Using rwkv-9Q-4k-stp248.pth I added N-0 and N-8 and a Ctx=16384 loss=1.65. This model looks that can chat better.
|
45 |
- rwkv-9Q-step607-N8-3.pth: Using rwkv-9Q-16k-step6-0-4.pth I add 100G tokens of N8-3.
|
46 |
+
- rwkv-9Q-4k-stp662-N8-3.pth: Using rwkv-9Q-step607-N8-3.pth I added 10G tokes more of N8-3.
|