something-else
/

rocm-rwkv

Model card Files Files and versions Community

something-else commited on Mar 23

Commit

84737d8

•

1 Parent(s): 144c88e

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -7,7 +7,7 @@ tags:
 - rocm-rwkv
 - 3B-rwkv
 ---
-3B rocm-rwkv pth record.
 - rwkv-final-chnk5.pth: 3B rocm-rwkv model trained with Slim pajama chunk1-5 and with a loss of 2.456.
 - rwkv-final-chnk17.pth: 3B rocm-rwkv model trained with Slim pajama chunk1-10 for the first epoch and an aditional training with chunk1-7 after the first epoch  and with a loss of 2.281
 - rwkv-code39-16012024.pth:  3B rocm-rwkv model trained with Slim pajama chunk1-10 for the first epoch and an aditional training with chunk1-8 after the first epoch; plus a little bit of code. This pth has a loss of 1.174 for code alone and 2.26 for text.

 - rocm-rwkv
 - 3B-rwkv
 ---
+3B rocm-rwkv pth record. This 3B is a little different than the usual 3B. This 3B model have 48 Layers, embd of 2048 and Ctxt of 16384 (I think that all pth have the same ctxt size).
 - rwkv-final-chnk5.pth: 3B rocm-rwkv model trained with Slim pajama chunk1-5 and with a loss of 2.456.
 - rwkv-final-chnk17.pth: 3B rocm-rwkv model trained with Slim pajama chunk1-10 for the first epoch and an aditional training with chunk1-7 after the first epoch  and with a loss of 2.281
 - rwkv-code39-16012024.pth:  3B rocm-rwkv model trained with Slim pajama chunk1-10 for the first epoch and an aditional training with chunk1-8 after the first epoch; plus a little bit of code. This pth has a loss of 1.174 for code alone and 2.26 for text.