something-else commited on
Commit
9e12a15
1 Parent(s): eadf500

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -0
README.md CHANGED
@@ -44,3 +44,11 @@ tags:
44
  - rwkv-9Q-16k-step6-0-4.pth: Using rwkv-9Q-4k-stp248.pth I added N-0 and N-8 and a Ctx=16384 loss=1.65. This model looks that can chat better.
45
  - rwkv-9Q-step607-N8-3.pth: Using rwkv-9Q-16k-step6-0-4.pth I add 100G tokens of N8-3.
46
  - rwkv-9Q-4k-stp662-N8-3.pth: Using rwkv-9Q-step607-N8-3.pth I added 10G tokes more of N8-3.
 
 
 
 
 
 
 
 
 
44
  - rwkv-9Q-16k-step6-0-4.pth: Using rwkv-9Q-4k-stp248.pth I added N-0 and N-8 and a Ctx=16384 loss=1.65. This model looks that can chat better.
45
  - rwkv-9Q-step607-N8-3.pth: Using rwkv-9Q-16k-step6-0-4.pth I add 100G tokens of N8-3.
46
  - rwkv-9Q-4k-stp662-N8-3.pth: Using rwkv-9Q-step607-N8-3.pth I added 10G tokes more of N8-3.
47
+
48
+ V6 models:
49
+
50
+
51
+ 6B rocm-rwkv pth record: 12 layers embd=6144 ctx=4096.
52
+
53
+ - rwkv-6B-N3-final.pth: 6B rocm-rwkv model trained with N3 with a final loss=3.56 after 100G Tokens
54
+ - rwkv-6B-N0-final.pth: starting from the previous pth rocm-rwkv trained with N0 with a final loss=3.11 after 100G Tokens