something-else
commited on
Commit
•
941052d
1
Parent(s):
3aaa6df
Update README.md
Browse files
README.md
CHANGED
@@ -28,6 +28,7 @@ tags:
|
|
28 |
|
29 |
|
30 |
7B rocm-rwkv pth record: I called this model Tlanuwa since I added an extra training focusing on cherokee after each run.
|
|
|
31 |
|
32 |
9B rocm-rwkv pth record: 40 layers embd=4096 ctx= 16384 I am calling this model Quetzal. I called this model Quetzal since it is a green model that flies and I am adding an extra training focusing on Spanish and the dataset Axolotl-Spanish-Nahuatl after each run.
|
33 |
- rwkv-9Q-stp101-N8.pth: 9B rocm-rwkv model trained with Slim pajama chunk1-10 for the first epoch and an aditional training with chunk1-2 and a mix of multi-language and code after that I am using the N8 dataset. I am currendly with the N8 dataset 4.222 GTokes. This pth has a loss of 1.904 regarding the N8 dataset.
|
|
|
28 |
|
29 |
|
30 |
7B rocm-rwkv pth record: I called this model Tlanuwa since I added an extra training focusing on cherokee after each run.
|
31 |
+
- rwkv-7BTlanuwa-1k-soup91-Final.pth: 7B model 32 layers embd=4096 ctx= 16384. This have all the same training as the 3B but only Slim pajama from 1-9 probably more than 2T tokens but a loss of 2.834 with respect to the the full soup91. I am working on getting a lower loss.
|
32 |
|
33 |
9B rocm-rwkv pth record: 40 layers embd=4096 ctx= 16384 I am calling this model Quetzal. I called this model Quetzal since it is a green model that flies and I am adding an extra training focusing on Spanish and the dataset Axolotl-Spanish-Nahuatl after each run.
|
34 |
- rwkv-9Q-stp101-N8.pth: 9B rocm-rwkv model trained with Slim pajama chunk1-10 for the first epoch and an aditional training with chunk1-2 and a mix of multi-language and code after that I am using the N8 dataset. I am currendly with the N8 dataset 4.222 GTokes. This pth has a loss of 1.904 regarding the N8 dataset.
|