jaspercatapang commited on
Commit
c736208
·
1 Parent(s): c83245b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -31,11 +31,13 @@ According to the leaderboard description, here are the benchmarks used for the e
31
  - [HellaSwag](https://arxiv.org/abs/1905.07830) (10-shot) - a test of commonsense inference, which is easy for humans (~95%) but challenging for SOTA models.
32
  - [TruthfulQA](https://arxiv.org/abs/2109.07958) (0-shot) - a test to measure a model’s propensity to reproduce falsehoods commonly found online.
33
 
34
- ## Leaderboard Highlights (as of July 21, 2023)
35
  - GodziLLa-30B is on par with [Falcon-40B-instruct](https://huggingface.co/tiiuae/falcon-40b-instruct) (June 2023's Rank #1).
36
  - GodziLLa-30B outperforms Meta AI's LLaMA [30B and 65B](https://ai.meta.com/blog/large-language-model-llama-meta-ai/) models.
37
- - GodziLLa-30B ranks 3rd worldwide, for open-source LLMs, on the [TruthfulQA](https://arxiv.org/abs/2109.07958) benchmark.
38
- - GodziLLa-30B is on par with [GPT-3.5 175B](https://platform.openai.com/docs/models/gpt-3-5) (text-davinci-003) on the [HellaSwag](https://arxiv.org/abs/1905.07830) benchmark and performs closely (< 3%) on the [MMLU](https://arxiv.org/abs/2009.03300) benchmark.
 
 
39
 
40
  ## Recommended Prompt Format
41
  Alpaca's instruction is the recommended prompt format, but Vicuna's instruction format may also work.
 
31
  - [HellaSwag](https://arxiv.org/abs/1905.07830) (10-shot) - a test of commonsense inference, which is easy for humans (~95%) but challenging for SOTA models.
32
  - [TruthfulQA](https://arxiv.org/abs/2109.07958) (0-shot) - a test to measure a model’s propensity to reproduce falsehoods commonly found online.
33
 
34
+ ## Leaderboard Highlights (as of July 22, 2023)
35
  - GodziLLa-30B is on par with [Falcon-40B-instruct](https://huggingface.co/tiiuae/falcon-40b-instruct) (June 2023's Rank #1).
36
  - GodziLLa-30B outperforms Meta AI's LLaMA [30B and 65B](https://ai.meta.com/blog/large-language-model-llama-meta-ai/) models.
37
+ - GodziLLa-30B ranks 4th worldwide, for open-source LLMs, on the [TruthfulQA](https://arxiv.org/abs/2109.07958) benchmark.
38
+ - GodziLLa-30B beats [GPT-3.5 175B](https://platform.openai.com/docs/models/gpt-3-5) (text-davinci-003) on the [TruthfulQA](https://arxiv.org/abs/2109.07958) benchmark and performs closely (< 4%) on the [HellaSwag](https://arxiv.org/abs/1905.07830) benchmark.*
39
+
40
+ *Based on a [leaderboard clone](https://huggingface.co/spaces/gsaivinay/open_llm_leaderboard) with GPT-3.5 and GPT-4 included.
41
 
42
  ## Recommended Prompt Format
43
  Alpaca's instruction is the recommended prompt format, but Vicuna's instruction format may also work.