Feature Extraction
Transformers
GGUF
English
Inference Endpoints

Bamboo-base-v0.1-GGUF

Description

Sparse computing is increasingly recognized as an important direction to improve the computational efficiency (e.g., inference speed) of large language models (LLM).

Recent studies (Zhang el al., 2021; Liu et al., 2023; Mirzadeh et al., 2023) reveal that LLMs inherently exhibit properties conducive to sparse computation when employing the ReLU activation function. This insight opens up new avenues for inference speed, akin to MoE's selective activation. By dynamically choosing model parameters for computation, we can substantially boost inference speed.

However, the widespread adoption of ReLU-based models in the LLM field remains limited. Here we introduce a new 7B ReLU-based LLM, Bamboo (Github link: https://github.com/SJTU-IPADS/Bamboo), which boasts nearly 85% sparsity and performance levels on par with Mistral-7B.

Citation

Please kindly cite using the following BibTeX:

@misc{bamboo,
    title={Bamboo: Harmonizing Sparsity and Performance in Large Language Models}, 
    author={Yixin Song, Haotong Xie, Zeyu Mi, Li Ma, Haibo Chen},
    year={2024}
}
Downloads last month
34
GGUF
Model size
7.24B params
Architecture
llama

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for QuantFactory/Bamboo-base-v0.1-GGUF

Quantized
(1)
this model