something-else
commited on
Commit
•
84737d8
1
Parent(s):
144c88e
Update README.md
Browse files
README.md
CHANGED
@@ -7,7 +7,7 @@ tags:
|
|
7 |
- rocm-rwkv
|
8 |
- 3B-rwkv
|
9 |
---
|
10 |
-
3B rocm-rwkv pth record.
|
11 |
- rwkv-final-chnk5.pth: 3B rocm-rwkv model trained with Slim pajama chunk1-5 and with a loss of 2.456.
|
12 |
- rwkv-final-chnk17.pth: 3B rocm-rwkv model trained with Slim pajama chunk1-10 for the first epoch and an aditional training with chunk1-7 after the first epoch and with a loss of 2.281
|
13 |
- rwkv-code39-16012024.pth: 3B rocm-rwkv model trained with Slim pajama chunk1-10 for the first epoch and an aditional training with chunk1-8 after the first epoch; plus a little bit of code. This pth has a loss of 1.174 for code alone and 2.26 for text.
|
|
|
7 |
- rocm-rwkv
|
8 |
- 3B-rwkv
|
9 |
---
|
10 |
+
3B rocm-rwkv pth record. This 3B is a little different than the usual 3B. This 3B model have 48 Layers, embd of 2048 and Ctxt of 16384 (I think that all pth have the same ctxt size).
|
11 |
- rwkv-final-chnk5.pth: 3B rocm-rwkv model trained with Slim pajama chunk1-5 and with a loss of 2.456.
|
12 |
- rwkv-final-chnk17.pth: 3B rocm-rwkv model trained with Slim pajama chunk1-10 for the first epoch and an aditional training with chunk1-7 after the first epoch and with a loss of 2.281
|
13 |
- rwkv-code39-16012024.pth: 3B rocm-rwkv model trained with Slim pajama chunk1-10 for the first epoch and an aditional training with chunk1-8 after the first epoch; plus a little bit of code. This pth has a loss of 1.174 for code alone and 2.26 for text.
|