something-else
/

rocm-rwkv

Model card Files Files and versions Community

something-else commited on Jul 12, 2024

Commit

9e12a15

·

verified ·

1 Parent(s): eadf500

Update README.md

Files changed (1) hide show

README.md +8 -0

README.md CHANGED Viewed

@@ -44,3 +44,11 @@ tags:
 - rwkv-9Q-16k-step6-0-4.pth: Using rwkv-9Q-4k-stp248.pth I added N-0 and N-8 and a Ctx=16384 loss=1.65. This model looks that can chat better.
 - rwkv-9Q-step607-N8-3.pth: Using rwkv-9Q-16k-step6-0-4.pth I add 100G tokens of N8-3.
 - rwkv-9Q-4k-stp662-N8-3.pth: Using rwkv-9Q-step607-N8-3.pth I added 10G tokes more of N8-3.

 - rwkv-9Q-16k-step6-0-4.pth: Using rwkv-9Q-4k-stp248.pth I added N-0 and N-8 and a Ctx=16384 loss=1.65. This model looks that can chat better.
 - rwkv-9Q-step607-N8-3.pth: Using rwkv-9Q-16k-step6-0-4.pth I add 100G tokens of N8-3.
 - rwkv-9Q-4k-stp662-N8-3.pth: Using rwkv-9Q-step607-N8-3.pth I added 10G tokes more of N8-3.
+V6 models:
+6B rocm-rwkv pth record: 12 layers embd=6144 ctx=4096.
+- rwkv-6B-N3-final.pth: 6B rocm-rwkv model trained with N3 with a final loss=3.56 after 100G Tokens
+- rwkv-6B-N0-final.pth: starting from the previous pth rocm-rwkv trained with N0 with a final loss=3.11 after 100G Tokens