Feature Extraction
Transformers
Safetensors
English
bamboo
custom_code
yzmizeyu commited on
Commit
e07ddf6
·
verified ·
1 Parent(s): b50d87b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -5
README.md CHANGED
@@ -9,11 +9,15 @@ datasets:
9
  ---
10
  ## Introducation
11
 
12
- Sparse computing is increasingly recognized as an important direction to improve the computational efficiency of large language models (LLM).
13
 
14
- Recent studies ([Zhang el al., 2021](https://arxiv.org/abs/2110.01786); [Liu et al., 2023](https://openreview.net/pdf?id=wIPIhHd00i); [Mirzadeh et al., 2023](https://arxiv.org/abs/2310.04564)) reveal that LLMs inherently exhibit properties conducive to sparse computation when employing the ReLU activation function. This insight opens up new avenues for model efficiency, akin to MoE's selective activation. By dynamically choosing model parameters for computation, we can substantially boost efficiency.
 
 
15
 
16
- However, the widespread adoption of ReLU-based models in the LLM field remains limited. Here we introduce a new 7B ReLU-based LLM, Bamboo(Github link:[https://github.com/SJTU-IPADS/Bamboo](https://github.com/SJTU-IPADS/Bamboo)), which boasts nearly 85% sparsity and performance levels on par with [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1).
 
 
17
 
18
  ## Model Architecture
19
 
@@ -75,9 +79,10 @@ Our evaluation is based on the framework lm-evaluation-harness and opencompass.
75
  | Ours | 0.6389 | 0.7593 | 0.4406 | 0.8217 | 0.5315 | 0.6195 | 0.256 | | |
76
  | Mistral | 0.6265 | 0.7924 | 0.4262 | 0.8332 | 0.4018 | 0.6143 | 0.2621 | | |
77
 
78
- ## Speed Evaluation Results
79
 
80
- We utilize [PowerInfer](https://arxiv.org/pdf/2312.12456.pdf), a state-of-the-art acceleration framework leveraging activation sparsity. Here we show the inference speed compared with llama.cpp/transformers.
 
81
 
82
  ## Limitation & Disclaimer
83
 
 
9
  ---
10
  ## Introducation
11
 
12
+ Sparse computing is increasingly recognized as an important direction to improve the computational efficiency (e.g., inference speed) of large language models (LLM).
13
 
14
+ Recent studies ([Zhang el al., 2021](https://arxiv.org/abs/2110.01786); [Liu et al., 2023](https://openreview.net/pdf?id=wIPIhHd00i); [Mirzadeh et al., 2023](https://arxiv.org/abs/2310.04564)) reveal that LLMs inherently exhibit properties conducive to sparse computation when employing the ReLU activation function.
15
+ This insight opens up new avenues for inference speed, akin to MoE's selective activation.
16
+ By dynamically choosing model parameters for computation, we can substantially boost inference speed.
17
 
18
+ However, the widespread adoption of ReLU-based models in the LLM field remains limited.
19
+ Here we introduce a new 7B ReLU-based LLM, Bamboo(Github link:[https://github.com/SJTU-IPADS/Bamboo](https://github.com/SJTU-IPADS/Bamboo)),
20
+ which boasts nearly 85% sparsity and performance levels on par with [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1).
21
 
22
  ## Model Architecture
23
 
 
79
  | Ours | 0.6389 | 0.7593 | 0.4406 | 0.8217 | 0.5315 | 0.6195 | 0.256 | | |
80
  | Mistral | 0.6265 | 0.7924 | 0.4262 | 0.8332 | 0.4018 | 0.6143 | 0.2621 | | |
81
 
82
+ ## Inference Speed Evaluation Results
83
 
84
+ We utilize [PowerInfer](https://arxiv.org/pdf/2312.12456.pdf), a state-of-the-art acceleration framework leveraging activation sparsity.
85
+ Here we show the inference speed compared with llama.cpp/transformers.
86
 
87
  ## Limitation & Disclaimer
88