Added image. Reworded initial remarks.
Browse files
README.md
CHANGED
@@ -4,12 +4,17 @@ license_name: tongyi-qianwen
|
|
4 |
library_name: transformers
|
5 |
---
|
6 |
|
7 |
-
|
8 |
|
9 |
-
|
10 |
-
For existing details of the process from our previous release, find it [here]: https://huggingface.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1
|
11 |
|
12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
|
14 |
| Tasks | Metric | Qwerky-QwQ-32B | Qwen/QwQ-32B | Qwerky-72B | Qwen2.5-72B-Instruct |
|
15 |
|:---:|:---:|:---:|:---:|:---:|:---:|
|
@@ -22,4 +27,4 @@ Benchmarks for Qwerky-QwQ-32B and the Qwerky-72B models
|
|
22 |
| winogrande | acc | **0.7324** | 0.7048 | **0.7956** | 0.7632 |
|
23 |
| mmlu | acc | 0.7431 | **0.7985** | 0.7746 | **0.8338** |
|
24 |
|
25 |
-
> All
|
|
|
4 |
library_name: transformers
|
5 |
---
|
6 |
|
7 |
+

|
8 |
|
9 |
+
Linear models offer a promising approach to significantly reduce computational costs at scale, particularly for large context lengths. Enabling a >1000x improvement in inference costs, enabling o1 inference time thinking and wider AI accessibility.
|
|
|
10 |
|
11 |
+
As demonstrated with our Qwerky-72B-Preview and prior models such as QRWKV6-32B Instruct Preview, we have successfully converted Qwen 2.5 72B into a RWKV variant without requiring a pretrain on the base model or retraining the model from scratch. Enabling us to test and validate the more efficient RWKV Linear attention with a much smaller budget. Since our preview, we have continued to refine our technique and managed to improve the model over the preview model iteration.
|
12 |
+
|
13 |
+
As with the previous models, the model's inherent knowledge and dataset training are inherited from its "parent" model. Consequently, unlike previous RWKV models trained on over 100+ languages, the QRWKV model is limited to approximately 30 languages supported by the Qwen line of models.
|
14 |
+
|
15 |
+
You may find our details of the process from our previous release, find it [here](https://huggingface.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1).
|
16 |
+
|
17 |
+
Benchmarks is as follows for both Qwerky-QwQ-32B and Qwerky-72B models:
|
18 |
|
19 |
| Tasks | Metric | Qwerky-QwQ-32B | Qwen/QwQ-32B | Qwerky-72B | Qwen2.5-72B-Instruct |
|
20 |
|:---:|:---:|:---:|:---:|:---:|:---:|
|
|
|
27 |
| winogrande | acc | **0.7324** | 0.7048 | **0.7956** | 0.7632 |
|
28 |
| mmlu | acc | 0.7431 | **0.7985** | 0.7746 | **0.8338** |
|
29 |
|
30 |
+
> *Note: All benchmarks except MMLU are 0-shot and Version 1. For MMLU, it's Version 2.*
|