Spaces:

GerbilLab
/

README

Running

crumb commited on Mar 30, 2023

Commit

3c0e761

1 Parent(s): 5e88deb

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -16,9 +16,6 @@ Who needs em, we all have em, they're just like us. Unusable models, compute opt
 The B, C, and D classes are derived from the tokens per model ratio from LLaMA, as LLaMA 65B is nearly Chinchilla-optimal with a ratio of 21 x Million Params tokens in training. Descending down the model sizes per training set for each model gives us these classes.
-We further project E-Class to have a ratio of 264, and F-Class to have a ratio of 490.
 | Model Name | Parameters | Class | Ratio | Tokens | Batch Size (Tokens) | Training Loss |
 | --- | --- | --- | --- | --- |  --- |  --- |
 | GerbilLab/Gerbil-A-3.3m | 3.3m | A-Class | 20 | 60M | 65.5k | 6.6644 |

 The B, C, and D classes are derived from the tokens per model ratio from LLaMA, as LLaMA 65B is nearly Chinchilla-optimal with a ratio of 21 x Million Params tokens in training. Descending down the model sizes per training set for each model gives us these classes.
 | Model Name | Parameters | Class | Ratio | Tokens | Batch Size (Tokens) | Training Loss |
 | --- | --- | --- | --- | --- |  --- |  --- |
 | GerbilLab/Gerbil-A-3.3m | 3.3m | A-Class | 20 | 60M | 65.5k | 6.6644 |