RWKV
/

v5-EagleX-v2-7B-pth

English

Model card Files Files and versions Community

m8than commited on Apr 17, 2024

Commit

12e8b13

1 Parent(s): bdd45a8

initial commit

Browse files

Files changed (2) hide show

README.md +93 -0
v5-EagleX-v2-7B.pth +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,93 @@

+---
+license: apache-2.0
+---
+![An eagle soaring above a transformer robot](https://substackcdn.com/image/fetch/w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10cf7fd1-6c72-4a99-84c2-794fb7bc52b3_2432x1664.png)
+### RWKV EagleX 7B v2 Model
+> **! Important Note !**
+>
+> The following is the raw representation of the EagleX 7B v2 model. **This is not meant to be used with the huggingface transformers**
+> And an experimental model, researched for research purposes.
+>
+>
+> This is not an instruct tune model! (soon...)
+## Evaluation
+The following shows the progression of the model from 1.1T trained to 2.25T trained.
+|Model                 |Eagle-7B-HF|EagleX-7B-HF-v1|EagleX-7B-HF-v2|
+|----------------------|-----------|---------------|---------------|
+|Param Count           |7.52 B     |7.52 B         |7.52 B         |
+|Tokens Trained        |1.1 T      |1.7 T          |2.25 T         |
+|avg_acc               |0.4822     |0.5391         |0.5495         |
+|glue (acc)            |0.5752     |0.7463         |0.7439         |
+|anli (acc)            |0.3594     |0.4847         |0.5097         |
+|mnli (acc)            |0.3802     |0.7928         |0.7884         |
+|mnli_mismatch (acc)   |0.3687     |0.7985         |0.784          |
+|swag (acc)            |0.568      |0.5814         |0.5905         |
+|lambada_standard (acc)|0.685      |0.686          |0.7004         |
+|lambada_openai (acc)  |0.7425     |0.7522         |0.7502         |
+|mmlu (acc)            |0.3321     |0.4014         |0.438          |
+|winogrande (acc)      |0.674      |0.7206         |0.7332         |
+|wnli (acc)            |0.4225     |0.4648         |0.493          |
+|truthfulqa (acc)      |0.3303     |0.3268         |0.3401         |
+|logiqa (acc)          |0.2458     |0.2458         |0.2458         |
+|logiqa2 (acc)         |0.2494     |0.2595         |0.2621         |
+|sciq (acc)            |0.955      |0.96           |0.93           |
+|piqa (acc)            |0.7704     |0.7758         |0.7764         |
+|arc_easy (acc)        |0.7382     |0.7555         |0.7445         |
+|arc_challenge (acc)   |0.3951     |0.4087         |0.4155         |
+|hellaswag (acc)       |0.5264     |0.5411         |0.56           |
+|openbookqa (acc)      |0.302      |0.296          |0.304          |
+|mathqa (acc)          |0.26       |0.26           |0.2593         |
+|arithmetic (acc)      |0.245      |0.0634         |0.1703         |
+Compared against other top performing models in the same weight class.
+|Model                 |EleutherAI/pythia-6.9b|aisingapore/sealion7b|RedPajama-INCITE-7B-Base|EleutherAI/gpt-j-6b|tiiuae/falcon-rw-7b|allenai/OLMo-7B|mosaicml/mpt-7b|tiiuae/falcon-7b|Llama-2-7b-hf|EagleX-7B-HF-v2|Mistral-7B-v0.1|
+|----------------------|----------------------|---------------------|------------------------|-------------------|-------------------|---------------|---------------|----------------|-------------|---------------|---------------|
+|Param Count           |6.86 B                |7.5 B                |6.86 B                  |6.05B              |6.92 B             |6.89 B         |6.7 B          |6.92 B          |6.74 B       |7.52 B         |7.24 B         |
+|Tokens Trained        |0.3 T                 |0.98 T               |1 T                     |0.4 T              |0.35 T             |2.5 T          |1 T            |1.5 T           |2 T          |2.25 T         |2 - 7 T?       |
+|avg_acc               |0.4237                |0.4326               |0.4411                  |0.4456             |0.4516             |0.4578         |0.4641         |0.4775          |0.5045       |0.5495         |0.5676         |
+|glue (acc)            |0.4765                |0.4483               |0.4748                  |0.455              |0.4825             |0.474          |0.4874         |0.4578          |0.4289       |0.7439         |0.515          |
+|anli (acc)            |0.3353                |0.3478               |0.3528                  |0.3391             |0.3344             |0.3478         |0.3403         |0.3541          |0.3697       |0.5097         |0.3803         |
+|mnli (acc)            |0.37                  |0.3657               |0.336                   |0.3768             |0.3632             |0.3294         |0.3784         |0.3893          |0.4269       |0.7884         |0.4542         |
+|mnli_mismatch (acc)   |0.3716                |0.3696               |0.327                   |0.3789             |0.3708             |0.3348         |0.3751         |0.404           |0.4395       |0.784          |0.4632         |
+|swag (acc)            |0.5368                |0.5217               |0.5493                  |0.5472             |0.5483             |0.5512         |0.5616         |0.5685          |0.5658       |0.5905         |0.5756         |
+|lambada_standard (acc)|0.5201                |0.5777               |0.6078                  |0.6097             |0.6062             |0.6396         |0.6208         |0.6868          |0.6808       |0.7004         |0.6944         |
+|lambada_openai (acc)  |0.609                 |0.6377               |0.7023                  |0.6779             |0.6332             |0.6872         |0.6872         |0.746           |0.7353       |0.7502         |0.7553         |
+|mmlu (acc)            |0.2594                |0.2705               |0.2618                  |0.2648             |0.256              |0.2812         |0.2913         |0.2512          |0.4077       |0.438          |0.5964         |
+|winogrande (acc)      |0.6148                |0.6054               |0.6504                  |0.6417             |0.6598             |0.6725         |0.6811         |0.6709          |0.6914       |0.7332         |0.7364         |
+|wnli (acc)            |0.3944                |0.5352               |0.5915                  |0.507              |0.507              |0.5775         |0.4789         |0.4789          |0.4648       |0.493          |0.5775         |
+|truthfulqa (acc)      |0.313                 |0.2783               |0.2957                  |0.3081             |0.2945             |0.3015         |0.2708         |0.2826          |0.3205       |0.3401         |0.3537         |
+|logiqa (acc)          |0.2381                |0.2212               |0.2289                  |0.212              |0.2181             |0.2335         |0.232          |0.2151          |0.2535       |0.2458         |0.2427         |
+|logiqa2 (acc)         |0.2239                |0.2188               |0.243                   |0.2316             |0.2354             |0.2506         |0.2525         |0.2252          |0.2564       |0.2621         |0.3022         |
+|sciq (acc)            |0.889                 |0.918                |0.925                   |0.914              |0.932              |0.927          |0.939          |0.944           |0.939        |0.93           |0.959          |
+|piqa (acc)            |0.7476                |0.7601               |0.5247                  |0.753              |0.7758             |0.7878         |0.7933         |0.7949          |0.7807       |0.7764         |0.8052         |
+|arc_easy (acc)        |0.6654                |0.678                |0.7193                  |0.6713             |0.7184             |0.7353         |0.7492         |0.7479          |0.7643       |0.7445         |0.8081         |
+|arc_challenge (acc)   |0.32                  |0.3183               |0.3686                  |0.3396             |0.366              |0.3677         |0.3968         |0.4027          |0.4309       |0.4155         |0.5009         |
+|hellaswag (acc)       |0.4768                |0.5015               |0.5247                  |0.4955             |0.5399             |0.5572         |0.5723         |0.5772          |0.5713       |0.56           |0.6131         |
+|openbookqa (acc)      |0.248                 |0.236                |0.292                   |0.288              |0.314              |0.292          |0.322          |0.306           |0.316        |0.304          |0.33           |
+|mathqa (acc)          |0.26                  |0.2372               |0.2623                  |0.2633             |0.26               |0.26           |0.26           |0.2884          |0.2801       |0.2593         |0.3554         |
+|arithmetic (acc)      |0.0271                |0.0379               |0.0254                  |0.0832             |0.0669             |0.0069         |0.0562         |0.2367          |0.4703       |0.1703         |0.9004         |
+See the following, for the full details on this model: [https://blog.rwkv.com/p/336f47bf-d8e9-4174-ac1d-02c6c8a99bc0](https://blog.rwkv.com/p/336f47bf-d8e9-4174-ac1d-02c6c8a99bc0)
+## Links
+- [HF Demo](###)
+- [Our wiki](https://wiki.rwkv.com)
+- [Full eval data](https://docs.google.com/spreadsheets/d/1CBLU6yKkW-8FMvGD4INO3qjeHZ0qkKnZFcM6n6lWNOs/edit#gid=912381775)
+## Acknowledgement
+We are grateful for the help and support from the following key groups:
+- [Recursal.ai](https://recursal.ai) team for financing the GPU resources, and managing the training of this foundation model - you can run the Eagle line of RWKV models on their cloud / on-premise platform today.
+- EleutherAI for their support, especially in the v5/v6 Eagle/Finch paper
+- Linux Foundation AI & Data group for supporting and hosting the RWKV project

v5-EagleX-v2-7B.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b4a07adb69c1ed8b8bcd3f5636914a0bfd67a1452ae2a77fbd8f193261761956
+size 15036198570