nomic-ai
/

gpt4all-j

Text Generation

Transformers

PyTorch

Safetensors

English

gptj

Model card Files Files and versions Community

zpn commited on May 4, 2023

Commit

3ab4c63

1 Parent(s): 77a35c8

update benchmarks

Browse files

Files changed (1) hide show

README.md +14 -6

README.md CHANGED Viewed

@@ -64,20 +64,28 @@ Trained on a DGX cluster with 8 A100 80GB GPUs for ~12 hours. Using Deepspeed +
 Results on common sense reasoning benchmarks
 ```
- Model                     BoolQ       PIQA     HellaSwag   WinoGrande    ARC-e      ARC-c       OBQA
   ----------------------- ---------- ---------- ----------- ------------ ---------- ---------- ----------
   GPT4All-J 6B v1.0          73.4       74.8       63.4         64.7        54.9       36.0       40.2
   GPT4All-J v1.1-breezy      74.0       75.1       63.2         63.6        55.4       34.9       38.4
-  GPT4All-J v1.2-jazzy      *74.8*      74.9       63.6         63.8        56.6       35.3       41.0
   GPT4All-J v1.3-groovy      73.6       74.3       63.8         63.5        57.7       35.0       38.8
   GPT4All-J Lora 6B          68.6       75.8       66.2         63.5        56.4       35.7       40.2
   GPT4All LLaMa Lora 7B      73.1       77.6       72.1         67.8        51.1       40.4       40.2
   Dolly 6B                   68.8       77.3       67.6         63.9        62.9       38.7       41.2
-  Dolly 12B                  56.7       75.4       71.0         62.2       *64.6*      38.5        40.4
   Alpaca 7B                  73.9       77.2       73.9         66.1        59.8       43.3       43.4
-  Alpaca Lora 7B             74.3      *79.3*     *74.0*       *68.8*       56.6      *43.9*     *42.6*
   GPT-J 6B                   65.4       76.2       66.2         64.1        62.2       36.6       38.2
-  LLaMa 7B                   73.1       77.4       73.0         66.9        52.5       41.4       42.4
   Pythia 6.9B                63.5       76.3       64.0         61.1        61.3       35.2       37.2
-  Pythia 12B                 67.7       76.6       67.3         63.8        63.9       34.8        38
 ```

 Results on common sense reasoning benchmarks
 ```
+  Model                     BoolQ       PIQA     HellaSwag   WinoGrande    ARC-e      ARC-c       OBQA
   ----------------------- ---------- ---------- ----------- ------------ ---------- ---------- ----------
   GPT4All-J 6B v1.0          73.4       74.8       63.4         64.7        54.9       36.0       40.2
   GPT4All-J v1.1-breezy      74.0       75.1       63.2         63.6        55.4       34.9       38.4
+  GPT4All-J v1.2-jazzy       74.8       74.9       63.6         63.8        56.6       35.3       41.0
   GPT4All-J v1.3-groovy      73.6       74.3       63.8         63.5        57.7       35.0       38.8
   GPT4All-J Lora 6B          68.6       75.8       66.2         63.5        56.4       35.7       40.2
   GPT4All LLaMa Lora 7B      73.1       77.6       72.1         67.8        51.1       40.4       40.2
+  GPT4All 13B snoozy        *83.3*      79.2       75.0        *71.3*       60.9      *44.2*      43.4
   Dolly 6B                   68.8       77.3       67.6         63.9        62.9       38.7       41.2
+  Dolly 12B                  56.7       75.4       71.0         62.2       *64.6*      38.5       40.4
   Alpaca 7B                  73.9       77.2       73.9         66.1        59.8       43.3       43.4
+  Alpaca Lora 7B             74.3      *79.3*      74.0         68.8        56.6       43.9       42.6
   GPT-J 6B                   65.4       76.2       66.2         64.1        62.2       36.6       38.2
+  LLama 7B                   73.1       77.4       73.0         66.9        52.5       41.4       42.4
+  LLama 13B                  68.5       79.1      *76.2*        70.1        60.0       44.6       42.2
   Pythia 6.9B                63.5       76.3       64.0         61.1        61.3       35.2       37.2
+  Pythia 12B                 67.7       76.6       67.3         63.8        63.9       34.8       38.0
+  Vicuña T5                  81.5       64.6       46.3         61.8        49.3       33.3       39.4
+  Vicuña 13B                 81.5       76.8       73.3         66.7        57.4       42.7       43.6
+  Stable Vicuña RLHF         82.3       78.6       74.1         70.9        61.0       43.5      *44.4*
+  StableLM Tuned             62.5       71.2       53.6         54.8        52.4       31.1       33.4
+  StableLM Base              60.1       67.4       41.2         50.1        44.9       27.0       32.0
 ```