Text Generation
Transformers
PyTorch
Safetensors
English
gptj
zpn commited on
Commit
3ab4c63
·
1 Parent(s): 77a35c8

update benchmarks

Browse files
Files changed (1) hide show
  1. README.md +14 -6
README.md CHANGED
@@ -64,20 +64,28 @@ Trained on a DGX cluster with 8 A100 80GB GPUs for ~12 hours. Using Deepspeed +
64
  Results on common sense reasoning benchmarks
65
 
66
  ```
67
- Model BoolQ PIQA HellaSwag WinoGrande ARC-e ARC-c OBQA
68
  ----------------------- ---------- ---------- ----------- ------------ ---------- ---------- ----------
69
  GPT4All-J 6B v1.0 73.4 74.8 63.4 64.7 54.9 36.0 40.2
70
  GPT4All-J v1.1-breezy 74.0 75.1 63.2 63.6 55.4 34.9 38.4
71
- GPT4All-J v1.2-jazzy *74.8* 74.9 63.6 63.8 56.6 35.3 41.0
72
  GPT4All-J v1.3-groovy 73.6 74.3 63.8 63.5 57.7 35.0 38.8
73
  GPT4All-J Lora 6B 68.6 75.8 66.2 63.5 56.4 35.7 40.2
74
  GPT4All LLaMa Lora 7B 73.1 77.6 72.1 67.8 51.1 40.4 40.2
 
75
  Dolly 6B 68.8 77.3 67.6 63.9 62.9 38.7 41.2
76
- Dolly 12B 56.7 75.4 71.0 62.2 *64.6* 38.5 40.4
77
  Alpaca 7B 73.9 77.2 73.9 66.1 59.8 43.3 43.4
78
- Alpaca Lora 7B 74.3 *79.3* *74.0* *68.8* 56.6 *43.9* *42.6*
79
  GPT-J 6B 65.4 76.2 66.2 64.1 62.2 36.6 38.2
80
- LLaMa 7B 73.1 77.4 73.0 66.9 52.5 41.4 42.4
 
81
  Pythia 6.9B 63.5 76.3 64.0 61.1 61.3 35.2 37.2
82
- Pythia 12B 67.7 76.6 67.3 63.8 63.9 34.8 38
 
 
 
 
 
 
83
  ```
 
64
  Results on common sense reasoning benchmarks
65
 
66
  ```
67
+ Model BoolQ PIQA HellaSwag WinoGrande ARC-e ARC-c OBQA
68
  ----------------------- ---------- ---------- ----------- ------------ ---------- ---------- ----------
69
  GPT4All-J 6B v1.0 73.4 74.8 63.4 64.7 54.9 36.0 40.2
70
  GPT4All-J v1.1-breezy 74.0 75.1 63.2 63.6 55.4 34.9 38.4
71
+ GPT4All-J v1.2-jazzy 74.8 74.9 63.6 63.8 56.6 35.3 41.0
72
  GPT4All-J v1.3-groovy 73.6 74.3 63.8 63.5 57.7 35.0 38.8
73
  GPT4All-J Lora 6B 68.6 75.8 66.2 63.5 56.4 35.7 40.2
74
  GPT4All LLaMa Lora 7B 73.1 77.6 72.1 67.8 51.1 40.4 40.2
75
+ GPT4All 13B snoozy *83.3* 79.2 75.0 *71.3* 60.9 *44.2* 43.4
76
  Dolly 6B 68.8 77.3 67.6 63.9 62.9 38.7 41.2
77
+ Dolly 12B 56.7 75.4 71.0 62.2 *64.6* 38.5 40.4
78
  Alpaca 7B 73.9 77.2 73.9 66.1 59.8 43.3 43.4
79
+ Alpaca Lora 7B 74.3 *79.3* 74.0 68.8 56.6 43.9 42.6
80
  GPT-J 6B 65.4 76.2 66.2 64.1 62.2 36.6 38.2
81
+ LLama 7B 73.1 77.4 73.0 66.9 52.5 41.4 42.4
82
+ LLama 13B 68.5 79.1 *76.2* 70.1 60.0 44.6 42.2
83
  Pythia 6.9B 63.5 76.3 64.0 61.1 61.3 35.2 37.2
84
+ Pythia 12B 67.7 76.6 67.3 63.8 63.9 34.8 38.0
85
+ Vicuña T5 81.5 64.6 46.3 61.8 49.3 33.3 39.4
86
+ Vicuña 13B 81.5 76.8 73.3 66.7 57.4 42.7 43.6
87
+ Stable Vicuña RLHF 82.3 78.6 74.1 70.9 61.0 43.5 *44.4*
88
+ StableLM Tuned 62.5 71.2 53.6 54.8 52.4 31.1 33.4
89
+ StableLM Base 60.1 67.4 41.2 50.1 44.9 27.0 32.0
90
+
91
  ```