initial commit
Browse files- README.md +93 -0
- v5-EagleX-v2-7B.pth +3 -0
README.md
ADDED
@@ -0,0 +1,93 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
---
|
4 |
+
|
5 |
+

|
6 |
+
|
7 |
+
### RWKV EagleX 7B v2 Model
|
8 |
+
|
9 |
+
> **! Important Note !**
|
10 |
+
>
|
11 |
+
> The following is the raw representation of the EagleX 7B v2 model. **This is not meant to be used with the huggingface transformers**
|
12 |
+
> And an experimental model, researched for research purposes.
|
13 |
+
>
|
14 |
+
>
|
15 |
+
> This is not an instruct tune model! (soon...)
|
16 |
+
|
17 |
+
## Evaluation
|
18 |
+
|
19 |
+
The following shows the progression of the model from 1.1T trained to 2.25T trained.
|
20 |
+
|
21 |
+
|Model |Eagle-7B-HF|EagleX-7B-HF-v1|EagleX-7B-HF-v2|
|
22 |
+
|----------------------|-----------|---------------|---------------|
|
23 |
+
|Param Count |7.52 B |7.52 B |7.52 B |
|
24 |
+
|Tokens Trained |1.1 T |1.7 T |2.25 T |
|
25 |
+
|avg_acc |0.4822 |0.5391 |0.5495 |
|
26 |
+
|glue (acc) |0.5752 |0.7463 |0.7439 |
|
27 |
+
|anli (acc) |0.3594 |0.4847 |0.5097 |
|
28 |
+
|mnli (acc) |0.3802 |0.7928 |0.7884 |
|
29 |
+
|mnli_mismatch (acc) |0.3687 |0.7985 |0.784 |
|
30 |
+
|swag (acc) |0.568 |0.5814 |0.5905 |
|
31 |
+
|lambada_standard (acc)|0.685 |0.686 |0.7004 |
|
32 |
+
|lambada_openai (acc) |0.7425 |0.7522 |0.7502 |
|
33 |
+
|mmlu (acc) |0.3321 |0.4014 |0.438 |
|
34 |
+
|winogrande (acc) |0.674 |0.7206 |0.7332 |
|
35 |
+
|wnli (acc) |0.4225 |0.4648 |0.493 |
|
36 |
+
|truthfulqa (acc) |0.3303 |0.3268 |0.3401 |
|
37 |
+
|logiqa (acc) |0.2458 |0.2458 |0.2458 |
|
38 |
+
|logiqa2 (acc) |0.2494 |0.2595 |0.2621 |
|
39 |
+
|sciq (acc) |0.955 |0.96 |0.93 |
|
40 |
+
|piqa (acc) |0.7704 |0.7758 |0.7764 |
|
41 |
+
|arc_easy (acc) |0.7382 |0.7555 |0.7445 |
|
42 |
+
|arc_challenge (acc) |0.3951 |0.4087 |0.4155 |
|
43 |
+
|hellaswag (acc) |0.5264 |0.5411 |0.56 |
|
44 |
+
|openbookqa (acc) |0.302 |0.296 |0.304 |
|
45 |
+
|mathqa (acc) |0.26 |0.26 |0.2593 |
|
46 |
+
|arithmetic (acc) |0.245 |0.0634 |0.1703 |
|
47 |
+
|
48 |
+
|
49 |
+
Compared against other top performing models in the same weight class.
|
50 |
+
|
51 |
+
|Model |EleutherAI/pythia-6.9b|aisingapore/sealion7b|RedPajama-INCITE-7B-Base|EleutherAI/gpt-j-6b|tiiuae/falcon-rw-7b|allenai/OLMo-7B|mosaicml/mpt-7b|tiiuae/falcon-7b|Llama-2-7b-hf|EagleX-7B-HF-v2|Mistral-7B-v0.1|
|
52 |
+
|----------------------|----------------------|---------------------|------------------------|-------------------|-------------------|---------------|---------------|----------------|-------------|---------------|---------------|
|
53 |
+
|Param Count |6.86 B |7.5 B |6.86 B |6.05B |6.92 B |6.89 B |6.7 B |6.92 B |6.74 B |7.52 B |7.24 B |
|
54 |
+
|Tokens Trained |0.3 T |0.98 T |1 T |0.4 T |0.35 T |2.5 T |1 T |1.5 T |2 T |2.25 T |2 - 7 T? |
|
55 |
+
|avg_acc |0.4237 |0.4326 |0.4411 |0.4456 |0.4516 |0.4578 |0.4641 |0.4775 |0.5045 |0.5495 |0.5676 |
|
56 |
+
|glue (acc) |0.4765 |0.4483 |0.4748 |0.455 |0.4825 |0.474 |0.4874 |0.4578 |0.4289 |0.7439 |0.515 |
|
57 |
+
|anli (acc) |0.3353 |0.3478 |0.3528 |0.3391 |0.3344 |0.3478 |0.3403 |0.3541 |0.3697 |0.5097 |0.3803 |
|
58 |
+
|mnli (acc) |0.37 |0.3657 |0.336 |0.3768 |0.3632 |0.3294 |0.3784 |0.3893 |0.4269 |0.7884 |0.4542 |
|
59 |
+
|mnli_mismatch (acc) |0.3716 |0.3696 |0.327 |0.3789 |0.3708 |0.3348 |0.3751 |0.404 |0.4395 |0.784 |0.4632 |
|
60 |
+
|swag (acc) |0.5368 |0.5217 |0.5493 |0.5472 |0.5483 |0.5512 |0.5616 |0.5685 |0.5658 |0.5905 |0.5756 |
|
61 |
+
|lambada_standard (acc)|0.5201 |0.5777 |0.6078 |0.6097 |0.6062 |0.6396 |0.6208 |0.6868 |0.6808 |0.7004 |0.6944 |
|
62 |
+
|lambada_openai (acc) |0.609 |0.6377 |0.7023 |0.6779 |0.6332 |0.6872 |0.6872 |0.746 |0.7353 |0.7502 |0.7553 |
|
63 |
+
|mmlu (acc) |0.2594 |0.2705 |0.2618 |0.2648 |0.256 |0.2812 |0.2913 |0.2512 |0.4077 |0.438 |0.5964 |
|
64 |
+
|winogrande (acc) |0.6148 |0.6054 |0.6504 |0.6417 |0.6598 |0.6725 |0.6811 |0.6709 |0.6914 |0.7332 |0.7364 |
|
65 |
+
|wnli (acc) |0.3944 |0.5352 |0.5915 |0.507 |0.507 |0.5775 |0.4789 |0.4789 |0.4648 |0.493 |0.5775 |
|
66 |
+
|truthfulqa (acc) |0.313 |0.2783 |0.2957 |0.3081 |0.2945 |0.3015 |0.2708 |0.2826 |0.3205 |0.3401 |0.3537 |
|
67 |
+
|logiqa (acc) |0.2381 |0.2212 |0.2289 |0.212 |0.2181 |0.2335 |0.232 |0.2151 |0.2535 |0.2458 |0.2427 |
|
68 |
+
|logiqa2 (acc) |0.2239 |0.2188 |0.243 |0.2316 |0.2354 |0.2506 |0.2525 |0.2252 |0.2564 |0.2621 |0.3022 |
|
69 |
+
|sciq (acc) |0.889 |0.918 |0.925 |0.914 |0.932 |0.927 |0.939 |0.944 |0.939 |0.93 |0.959 |
|
70 |
+
|piqa (acc) |0.7476 |0.7601 |0.5247 |0.753 |0.7758 |0.7878 |0.7933 |0.7949 |0.7807 |0.7764 |0.8052 |
|
71 |
+
|arc_easy (acc) |0.6654 |0.678 |0.7193 |0.6713 |0.7184 |0.7353 |0.7492 |0.7479 |0.7643 |0.7445 |0.8081 |
|
72 |
+
|arc_challenge (acc) |0.32 |0.3183 |0.3686 |0.3396 |0.366 |0.3677 |0.3968 |0.4027 |0.4309 |0.4155 |0.5009 |
|
73 |
+
|hellaswag (acc) |0.4768 |0.5015 |0.5247 |0.4955 |0.5399 |0.5572 |0.5723 |0.5772 |0.5713 |0.56 |0.6131 |
|
74 |
+
|openbookqa (acc) |0.248 |0.236 |0.292 |0.288 |0.314 |0.292 |0.322 |0.306 |0.316 |0.304 |0.33 |
|
75 |
+
|mathqa (acc) |0.26 |0.2372 |0.2623 |0.2633 |0.26 |0.26 |0.26 |0.2884 |0.2801 |0.2593 |0.3554 |
|
76 |
+
|arithmetic (acc) |0.0271 |0.0379 |0.0254 |0.0832 |0.0669 |0.0069 |0.0562 |0.2367 |0.4703 |0.1703 |0.9004 |
|
77 |
+
|
78 |
+
|
79 |
+
|
80 |
+
See the following, for the full details on this model: [https://blog.rwkv.com/p/336f47bf-d8e9-4174-ac1d-02c6c8a99bc0](https://blog.rwkv.com/p/336f47bf-d8e9-4174-ac1d-02c6c8a99bc0)
|
81 |
+
|
82 |
+
|
83 |
+
## Links
|
84 |
+
- [HF Demo](###)
|
85 |
+
- [Our wiki](https://wiki.rwkv.com)
|
86 |
+
- [Full eval data](https://docs.google.com/spreadsheets/d/1CBLU6yKkW-8FMvGD4INO3qjeHZ0qkKnZFcM6n6lWNOs/edit#gid=912381775)
|
87 |
+
|
88 |
+
## Acknowledgement
|
89 |
+
We are grateful for the help and support from the following key groups:
|
90 |
+
|
91 |
+
- [Recursal.ai](https://recursal.ai) team for financing the GPU resources, and managing the training of this foundation model - you can run the Eagle line of RWKV models on their cloud / on-premise platform today.
|
92 |
+
- EleutherAI for their support, especially in the v5/v6 Eagle/Finch paper
|
93 |
+
- Linux Foundation AI & Data group for supporting and hosting the RWKV project
|
v5-EagleX-v2-7B.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b4a07adb69c1ed8b8bcd3f5636914a0bfd67a1452ae2a77fbd8f193261761956
|
3 |
+
size 15036198570
|