Text Generation
Transformers
Safetensors
zamba2
BerenMillidge commited on
Commit
9570f91
·
verified ·
1 Parent(s): 4074c1c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -20
README.md CHANGED
@@ -3,7 +3,7 @@ license: apache-2.0
3
  ---
4
  # Model Card for Zamba v2 2.7B
5
 
6
- Zamba-2-2.7B is a hybrid model between state-space models and transformers. It broadly follows the [Zamba architecture](https://huggingface.co/Zyphra/Zamba-7B-v1) which consists of a Mamba backbone alternating with shared transformer blocks. Zamba-2-2.7B possesses three major improvements over Zamba1:
7
 
8
  1.) Mamba1 blocks have been replaced with Mamba2 blocks.
9
  2.) Instead of a single shared attention block, we utilize two shared attention blocks which are interleaved in an ABAB pattern through the network.
@@ -55,35 +55,23 @@ Zamba2-2.7B utilizes and extends our original Zamba hybrid SSM-attention archite
55
  </center>
56
 
57
 
58
- ## Performance [to update!]
59
 
60
- We find that Zamba performs significantly better than existing open models (with open datasets and training details) at this scale. However, it performs slightly worse than the leading open-weight models at the 7B scale. Most of this difference derives from MMLU and reasoning evaluations. Zamba, however, is trained on significantly fewer tokens than these models and is the most sample efficient model in terms of performance per training tokens.
61
 
 
62
 
63
  <center>
64
- <img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/FG73iXpiDGSX_opbDJxKo.png" width="700" alt="Zamba performance">
65
  </center>
66
 
67
-
68
- Due to its SSM architecture, Zamba is extremely efficient in inference, substantially outperforming comparable 7B and 8B models in inference latency as well as memory cost of generation due to its substantially diminished KV cache.
69
 
70
  <center>
71
- <img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/cghYPnDbdzweT1b2RyiXA.png" width="400" alt="Zamba performance">
72
  </center>
73
 
74
- ## Citation
75
-
76
- If you find Zamba useful in your work please cite it as:
77
-
78
- ```
79
- @article{glorioso2024zamba,
80
- title={Zamba: A Compact 7B SSM Hybrid Model},
81
- author={Glorioso, Paolo and Anthony, Quentin and Tokpanov, Yury and Whittington, James and Pilault, Jonathan and Ibrahim, Adam and Millidge, Beren},
82
- journal={arXiv preprint arXiv:2405.16712},
83
- year={2024}
84
- }
85
- ```
86
 
87
  ## Notice
88
 
89
- Zamba2-2.7B is a pretrained base model and therefore does not have any moderation mechanism. In addition, one should not expect good chat performance, as this model was not fine-tuned for chat.
 
3
  ---
4
  # Model Card for Zamba v2 2.7B
5
 
6
+ Zamba-2-2.7B is a hybrid model between state-space models and transformers. It broadly follows the [Zamba architecture](https://arxiv.org/abs/2405.16712) which consists of a Mamba backbone alternating with shared transformer blocks. Zamba-2-2.7B possesses three major improvements over Zamba1:
7
 
8
  1.) Mamba1 blocks have been replaced with Mamba2 blocks.
9
  2.) Instead of a single shared attention block, we utilize two shared attention blocks which are interleaved in an ABAB pattern through the network.
 
55
  </center>
56
 
57
 
58
+ ## Performance
59
 
60
+ Zamba2-2.7B achieves leading and state-of-the-art performance among models of <3B parameters and is competitive with some models of significantly greater size. Moreover, due to its unique hybrid SSM architecture, Zamba2-2.7B achieves extremely low latency and rapid generation with a significantly smaller memory footprint than comparable transformer based models.
61
 
62
+ Zamba2-2.7B's high performance and small compute and memory footprint renders it an ideal generalist model for on-device applications.
63
 
64
  <center>
65
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/U7VD9PYLj3XcEjgV08sP5.png" width="700" alt="Zamba performance">
66
  </center>
67
 
68
+ (-/ TODO All eval figure)
 
69
 
70
  <center>
71
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/Y_X1hc4UwXLwrttyQpaxY.png" width="700" alt="Zamba inference and memory cost">
72
  </center>
73
 
 
 
 
 
 
 
 
 
 
 
 
 
74
 
75
  ## Notice
76
 
77
+ Zamba2-2.7B is a pretrained base model and therefore does not have any moderation mechanism and may output toxic or otherwise harmful language. In addition, one should not expect good instruct or chat performance, as this model was not fine-tuned for instruction following or chat.