crumb commited on
Commit
3c0e761
·
1 Parent(s): 5e88deb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -3
README.md CHANGED
@@ -16,9 +16,6 @@ Who needs em, we all have em, they're just like us. Unusable models, compute opt
16
 
17
  The B, C, and D classes are derived from the tokens per model ratio from LLaMA, as LLaMA 65B is nearly Chinchilla-optimal with a ratio of 21 x Million Params tokens in training. Descending down the model sizes per training set for each model gives us these classes.
18
 
19
- We further project E-Class to have a ratio of 264, and F-Class to have a ratio of 490.
20
-
21
-
22
  | Model Name | Parameters | Class | Ratio | Tokens | Batch Size (Tokens) | Training Loss |
23
  | --- | --- | --- | --- | --- | --- | --- |
24
  | GerbilLab/Gerbil-A-3.3m | 3.3m | A-Class | 20 | 60M | 65.5k | 6.6644 |
 
16
 
17
  The B, C, and D classes are derived from the tokens per model ratio from LLaMA, as LLaMA 65B is nearly Chinchilla-optimal with a ratio of 21 x Million Params tokens in training. Descending down the model sizes per training set for each model gives us these classes.
18
 
 
 
 
19
  | Model Name | Parameters | Class | Ratio | Tokens | Batch Size (Tokens) | Training Loss |
20
  | --- | --- | --- | --- | --- | --- | --- |
21
  | GerbilLab/Gerbil-A-3.3m | 3.3m | A-Class | 20 | 60M | 65.5k | 6.6644 |