euclaise commited on
Commit
966c190
1 Parent(s): aad5c9a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -73,7 +73,7 @@ Memphis outperforms human-data models that are over twice its size, along with S
73
  Note that BBH results have wide SEs, exceeding 16%.
74
 
75
 
76
- It is unclear why Zephyr performs so poorly on BBH. Perhaps it is overfit.
77
 
78
  Notes:
79
  - Evaluations were performed using the `agieval` branch of [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) (commit `0bef5c9c273b1c2f68e6018d4bb9c32b9aaff298`), using the `vllm` model.
 
73
  Note that BBH results have wide SEs, exceeding 16%.
74
 
75
 
76
+ It is unclear why Zephyr performs so poorly on BBH. Perhaps it is overfit, or maybe there was an issue with vllm.
77
 
78
  Notes:
79
  - Evaluations were performed using the `agieval` branch of [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) (commit `0bef5c9c273b1c2f68e6018d4bb9c32b9aaff298`), using the `vllm` model.