|
We released **Abel-7B-002**, resulting in a stronger (35% improvement on GSM8K, 126% improvement on MATH) and more generalized model, achieving the best performance among all 7B models (80.44 on GSM8K, 29.46 on MATH) |
|
|
|
| Model | GSM8k | MATH |MathQA | SVAMP |SCQ5K-EN | ARC-E|ARC-C|HellaSwag|MMLU | |
|
|-----------|------------|----------|--------------|-----------|----------------|---------|----------|---------------|----------| |
|
|Abel-7B-002 | **80.44** | **29.46** | **69.78** |77.67 |**55.95** |77.67 |**55.05** |77.72 |61.19 | |
|
|Abel-7B-001 |59.74 |13 |1.21 |57.67 |9.3 |53.32 |38.97 |63.51|40.59 | |
|
|MetaMath-Mistral-7B|77.7 |28.2 |33.94 |**79.33** |37.6| **78.48** |51.93 |76.44| 61.93| |
|
|Qwen-7b|47.84 |9.34 |27.44 |53 |40.05 |74.97 |53.05 |**86.85**|57.98 | |
|
|Mistral-7b|37.83 |9.06 |25.73 |63 |39.6 |76.83 |53.22| 76.31|**64.05** | |
|
|Yi-6b| 32.6 |5.78 |26.98 |55.67 |35.5 |73.66 |49.53 |68.97|64.02 | |
|
|LLaMA2-7b|12.96 |2.78 |11.52 |44 |28.24 |71.12 |46.61 |71.32|46.7 | |
|
|
|
Please cite the repo if the model/code/conclusion in this repo are helpful to you. |
|
``` |
|
@misc{abel, |
|
author = {Chern, Ethan and Zou, Haoyang and Li, Xuefeng and Hu, Jiewen and Feng, Kehua and Li, Junlong and Liu, Pengfei}, |
|
title = {Generative AI for Math: Abel}, |
|
year = {2023}, |
|
publisher = {GitHub}, |
|
journal = {GitHub repository}, |
|
howpublished = {\url{https://github.com/GAIR-NLP/abel}}, |
|
} |
|
``` |