something-else
commited on
Commit
•
02b85c8
1
Parent(s):
3f2762f
Update README.md
Browse files
README.md
CHANGED
@@ -33,4 +33,5 @@ tags:
|
|
33 |
- rwkv-9Q-stp101-N8.pth: 9B rocm-rwkv model trained with Slim pajama chunk1-10 for the first epoch and an aditional training with chunk1-2 and a mix of multi-language and code after that I am using the N8 dataset. I am currendly with the N8 dataset 4.222 GTokes. This pth has a loss of 1.904 regarding the N8 dataset.
|
34 |
- rwkv-9Q-1k-stp307-1k-N8.pth: 9B rocm-rwkv model trained with Slim pajama chunk1-10 for the first epoch and an aditional training with chunk1-2 and a mix of multi-language and code after that I am using the N8 dataset. I am currendly with the N8 dataset 12.706 GTokes. This pth has a loss of 1.871 regarding the N8 dataset.
|
35 |
- rwkv-9Q-Soup91-step298.pth : Using the rwkv-9Q-1k-stp307-1k-N8.pth I added 298 epoch steps of my soup of data (code + math+ instruct+ chain of thought) 12.283 Gtokens with a loss of 2.242.
|
36 |
-
- rwkv-9Q-Soup91-Final.pth : Using the rwkv-9Q-Soup91-step298.pth I added 298 -> 1035 epoch steps of my soup of data (code + math+ instruct+ chain of thought) 42.733 Gtokens with a loss of 2.222.
|
|
|
|
33 |
- rwkv-9Q-stp101-N8.pth: 9B rocm-rwkv model trained with Slim pajama chunk1-10 for the first epoch and an aditional training with chunk1-2 and a mix of multi-language and code after that I am using the N8 dataset. I am currendly with the N8 dataset 4.222 GTokes. This pth has a loss of 1.904 regarding the N8 dataset.
|
34 |
- rwkv-9Q-1k-stp307-1k-N8.pth: 9B rocm-rwkv model trained with Slim pajama chunk1-10 for the first epoch and an aditional training with chunk1-2 and a mix of multi-language and code after that I am using the N8 dataset. I am currendly with the N8 dataset 12.706 GTokes. This pth has a loss of 1.871 regarding the N8 dataset.
|
35 |
- rwkv-9Q-Soup91-step298.pth : Using the rwkv-9Q-1k-stp307-1k-N8.pth I added 298 epoch steps of my soup of data (code + math+ instruct+ chain of thought) 12.283 Gtokens with a loss of 2.242.
|
36 |
+
- rwkv-9Q-Soup91-Final.pth : Using the rwkv-9Q-Soup91-step298.pth I added 298 -> 1035 epoch steps of my soup of data (code + math+ instruct+ chain of thought) 42.733 Gtokens with a loss of 2.222.
|
37 |
+
- rwkv-9Q-stp1447-N8.pth : Using rwkv-9Q-Soup91-Final.pth I added 1447 steps of N8 59.733 Gtokens with a loss of 1.827.
|