crumb commited on
Commit
c0c50b8
·
1 Parent(s): 9ecdcde

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -65,6 +65,12 @@ Nearly every base model that isn't finetuned for a specific task was trained on
65
 
66
  ```
67
 
 
 
 
 
 
 
68
  Some applications where I can imagine these being useful are: warm-starting very small encoder-decoder models, fitting a new scaling law that takes into account smaller models, or having a "fuzzy wrapper" around an API. They also could be usable on their own (for classification or other) when finetuned on more specific datasets. I don't expect the 3.3m models to be useful for any task whatsoever. Every model was trained on a singular GPU, either a RTX2060, RTX3060, or a T4.
69
 
70
  I'd , uh , appreciate help in evaluating all these models probably with lm harness!!
 
65
 
66
  ```
67
 
68
+ "Instruct" models have these special tokens:
69
+
70
+ ```
71
+ <prompt> your prompt goes here <output> the model outputs a result here.
72
+ ```
73
+
74
  Some applications where I can imagine these being useful are: warm-starting very small encoder-decoder models, fitting a new scaling law that takes into account smaller models, or having a "fuzzy wrapper" around an API. They also could be usable on their own (for classification or other) when finetuned on more specific datasets. I don't expect the 3.3m models to be useful for any task whatsoever. Every model was trained on a singular GPU, either a RTX2060, RTX3060, or a T4.
75
 
76
  I'd , uh , appreciate help in evaluating all these models probably with lm harness!!