KaraKaraWitch commited on
Commit
0a7f8f0
·
verified ·
1 Parent(s): 9c03dad

Added image. Reworded initial remarks.

Browse files
Files changed (1) hide show
  1. README.md +10 -5
README.md CHANGED
@@ -4,12 +4,17 @@ license_name: tongyi-qianwen
4
  library_name: transformers
5
  ---
6
 
7
- # Qwerky-72B
8
 
9
- The following is a model converted from Qwen 2.5 72B, to the RWKV based architecture.
10
- For existing details of the process from our previous release, find it [here]: https://huggingface.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1
11
 
12
- Benchmarks for Qwerky-QwQ-32B and the Qwerky-72B models
 
 
 
 
 
 
13
 
14
  | Tasks | Metric | Qwerky-QwQ-32B | Qwen/QwQ-32B | Qwerky-72B | Qwen2.5-72B-Instruct |
15
  |:---:|:---:|:---:|:---:|:---:|:---:|
@@ -22,4 +27,4 @@ Benchmarks for Qwerky-QwQ-32B and the Qwerky-72B models
22
  | winogrande | acc | **0.7324** | 0.7048 | **0.7956** | 0.7632 |
23
  | mmlu | acc | 0.7431 | **0.7985** | 0.7746 | **0.8338** |
24
 
25
- > All benchmark's besides MMLU are 0 n-shot, and is version 1, MMLU is version 2
 
4
  library_name: transformers
5
  ---
6
 
7
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/633e85093a17ab61de8d9073/dM-i7n313mUnY-fbmElVM.png)
8
 
9
+ Linear models offer a promising approach to significantly reduce computational costs at scale, particularly for large context lengths. Enabling a >1000x improvement in inference costs, enabling o1 inference time thinking and wider AI accessibility.
 
10
 
11
+ As demonstrated with our Qwerky-72B-Preview and prior models such as QRWKV6-32B Instruct Preview, we have successfully converted Qwen 2.5 72B into a RWKV variant without requiring a pretrain on the base model or retraining the model from scratch. Enabling us to test and validate the more efficient RWKV Linear attention with a much smaller budget. Since our preview, we have continued to refine our technique and managed to improve the model over the preview model iteration.
12
+
13
+ As with the previous models, the model's inherent knowledge and dataset training are inherited from its "parent" model. Consequently, unlike previous RWKV models trained on over 100+ languages, the QRWKV model is limited to approximately 30 languages supported by the Qwen line of models.
14
+
15
+ You may find our details of the process from our previous release, find it [here](https://huggingface.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1).
16
+
17
+ Benchmarks is as follows for both Qwerky-QwQ-32B and Qwerky-72B models:
18
 
19
  | Tasks | Metric | Qwerky-QwQ-32B | Qwen/QwQ-32B | Qwerky-72B | Qwen2.5-72B-Instruct |
20
  |:---:|:---:|:---:|:---:|:---:|:---:|
 
27
  | winogrande | acc | **0.7324** | 0.7048 | **0.7956** | 0.7632 |
28
  | mmlu | acc | 0.7431 | **0.7985** | 0.7746 | **0.8338** |
29
 
30
+ > *Note: All benchmarks except MMLU are 0-shot and Version 1. For MMLU, it's Version 2.*