Update README.md
Browse files
README.md
CHANGED
@@ -3,7 +3,7 @@ license: apache-2.0
|
|
3 |
---
|
4 |
# Model Card for Zamba v2 2.7B
|
5 |
|
6 |
-
Zamba-2-2.7B is a hybrid model between state-space models and transformers. It broadly follows the [Zamba architecture](https://
|
7 |
|
8 |
1.) Mamba1 blocks have been replaced with Mamba2 blocks.
|
9 |
2.) Instead of a single shared attention block, we utilize two shared attention blocks which are interleaved in an ABAB pattern through the network.
|
@@ -55,35 +55,23 @@ Zamba2-2.7B utilizes and extends our original Zamba hybrid SSM-attention archite
|
|
55 |
</center>
|
56 |
|
57 |
|
58 |
-
## Performance
|
59 |
|
60 |
-
|
61 |
|
|
|
62 |
|
63 |
<center>
|
64 |
-
<img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/
|
65 |
</center>
|
66 |
|
67 |
-
|
68 |
-
Due to its SSM architecture, Zamba is extremely efficient in inference, substantially outperforming comparable 7B and 8B models in inference latency as well as memory cost of generation due to its substantially diminished KV cache.
|
69 |
|
70 |
<center>
|
71 |
-
<img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/
|
72 |
</center>
|
73 |
|
74 |
-
## Citation
|
75 |
-
|
76 |
-
If you find Zamba useful in your work please cite it as:
|
77 |
-
|
78 |
-
```
|
79 |
-
@article{glorioso2024zamba,
|
80 |
-
title={Zamba: A Compact 7B SSM Hybrid Model},
|
81 |
-
author={Glorioso, Paolo and Anthony, Quentin and Tokpanov, Yury and Whittington, James and Pilault, Jonathan and Ibrahim, Adam and Millidge, Beren},
|
82 |
-
journal={arXiv preprint arXiv:2405.16712},
|
83 |
-
year={2024}
|
84 |
-
}
|
85 |
-
```
|
86 |
|
87 |
## Notice
|
88 |
|
89 |
-
Zamba2-2.7B is a pretrained base model and therefore does not have any moderation mechanism. In addition, one should not expect good chat performance, as this model was not fine-tuned for chat.
|
|
|
3 |
---
|
4 |
# Model Card for Zamba v2 2.7B
|
5 |
|
6 |
+
Zamba-2-2.7B is a hybrid model between state-space models and transformers. It broadly follows the [Zamba architecture](https://arxiv.org/abs/2405.16712) which consists of a Mamba backbone alternating with shared transformer blocks. Zamba-2-2.7B possesses three major improvements over Zamba1:
|
7 |
|
8 |
1.) Mamba1 blocks have been replaced with Mamba2 blocks.
|
9 |
2.) Instead of a single shared attention block, we utilize two shared attention blocks which are interleaved in an ABAB pattern through the network.
|
|
|
55 |
</center>
|
56 |
|
57 |
|
58 |
+
## Performance
|
59 |
|
60 |
+
Zamba2-2.7B achieves leading and state-of-the-art performance among models of <3B parameters and is competitive with some models of significantly greater size. Moreover, due to its unique hybrid SSM architecture, Zamba2-2.7B achieves extremely low latency and rapid generation with a significantly smaller memory footprint than comparable transformer based models.
|
61 |
|
62 |
+
Zamba2-2.7B's high performance and small compute and memory footprint renders it an ideal generalist model for on-device applications.
|
63 |
|
64 |
<center>
|
65 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/U7VD9PYLj3XcEjgV08sP5.png" width="700" alt="Zamba performance">
|
66 |
</center>
|
67 |
|
68 |
+
(-/ TODO All eval figure)
|
|
|
69 |
|
70 |
<center>
|
71 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/65c05e75c084467acab2f84a/Y_X1hc4UwXLwrttyQpaxY.png" width="700" alt="Zamba inference and memory cost">
|
72 |
</center>
|
73 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
74 |
|
75 |
## Notice
|
76 |
|
77 |
+
Zamba2-2.7B is a pretrained base model and therefore does not have any moderation mechanism and may output toxic or otherwise harmful language. In addition, one should not expect good instruct or chat performance, as this model was not fine-tuned for instruction following or chat.
|