Zyphra
/

Zamba2-2.7B

Text Generation

Transformers

Safetensors

zamba2

Model card Files Files and versions Community

BerenMillidge commited on Jul 28, 2024

Commit

9570f91

verified ·

1 Parent(s): 4074c1c

Update README.md

Browse files

Files changed (1) hide show

README.md +8 -20

README.md CHANGED Viewed

@@ -3,7 +3,7 @@ license: apache-2.0
 ---
 # Model Card for Zamba v2 2.7B
-Zamba-2-2.7B is a hybrid model between state-space models and transformers. It broadly follows the [Zamba architecture](https://huggingface.co/Zyphra/Zamba-7B-v1) which consists of a Mamba backbone alternating with shared transformer blocks. Zamba-2-2.7B possesses three major improvements over Zamba1:
 1.) Mamba1 blocks have been replaced with Mamba2 blocks.
 2.) Instead of a single shared attention block, we utilize two shared attention blocks which are interleaved in an ABAB pattern through the network.
@@ -55,35 +55,23 @@ Zamba2-2.7B utilizes and extends our original Zamba hybrid SSM-attention archite
 </center>
-## Performance [to update!]
-We find that Zamba performs significantly better than existing open models (with open datasets and training details) at this scale. However, it performs slightly worse than the leading open-weight models at the 7B scale. Most of this difference derives from MMLU and reasoning evaluations. Zamba, however, is trained on significantly fewer tokens than these models and is the most sample efficient model in terms of performance per training tokens.
 <center>
-<img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/FG73iXpiDGSX_opbDJxKo.png" width="700" alt="Zamba performance">
 </center>
-Due to its SSM architecture, Zamba is extremely efficient in inference, substantially outperforming comparable 7B and 8B models in inference latency as well as memory cost of generation due to its substantially diminished KV cache.
 <center>
-<img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/cghYPnDbdzweT1b2RyiXA.png" width="400" alt="Zamba performance">
 </center>
-## Citation
-If you find Zamba useful in your work please cite it as:
-```
-@article{glorioso2024zamba,
-  title={Zamba: A Compact 7B SSM Hybrid Model},
-  author={Glorioso, Paolo and Anthony, Quentin and Tokpanov, Yury and Whittington, James and Pilault, Jonathan and Ibrahim, Adam and Millidge, Beren},
-  journal={arXiv preprint arXiv:2405.16712},
-  year={2024}
-}
-```
 ## Notice
-Zamba2-2.7B is a pretrained base model and therefore does not have any moderation mechanism. In addition, one should not expect good chat performance, as this model was not fine-tuned for chat.

 ---
 # Model Card for Zamba v2 2.7B
+Zamba-2-2.7B is a hybrid model between state-space models and transformers. It broadly follows the [Zamba architecture](https://arxiv.org/abs/2405.16712) which consists of a Mamba backbone alternating with shared transformer blocks. Zamba-2-2.7B possesses three major improvements over Zamba1:
 1.) Mamba1 blocks have been replaced with Mamba2 blocks.
 2.) Instead of a single shared attention block, we utilize two shared attention blocks which are interleaved in an ABAB pattern through the network.
 </center>
+## Performance
+Zamba2-2.7B achieves leading and state-of-the-art performance among models of <3B parameters and is competitive with some models of significantly greater size. Moreover, due to its unique hybrid SSM architecture, Zamba2-2.7B achieves extremely low latency and rapid generation with a significantly smaller memory footprint than comparable transformer based models.
+Zamba2-2.7B's high performance and small compute and memory footprint renders it an ideal generalist model for on-device applications.
 <center>
+<img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/U7VD9PYLj3XcEjgV08sP5.png" width="700" alt="Zamba performance">
 </center>
+(-/ TODO All eval figure)
 <center>
+<img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/Y_X1hc4UwXLwrttyQpaxY.png" width="700" alt="Zamba inference and memory cost">
 </center>
 ## Notice
+Zamba2-2.7B is a pretrained base model and therefore does not have any moderation mechanism and may output toxic or otherwise harmful language. In addition, one should not expect good instruct or chat performance, as this model was not fine-tuned for instruction following or chat.