BerenMillidge
commited on
Commit
•
8dd106c
1
Parent(s):
88a4c87
Update README.md
Browse files
README.md
CHANGED
@@ -7,6 +7,8 @@ Zamba-7B-v1 is a hybrid model between Mamba, a state-space model, and transforme
|
|
7 |
|
8 |
Note: the current Huggingface implementation of Zamba performs slower than our internal implementation. We are working to fix this with the Huggingface team.
|
9 |
|
|
|
|
|
10 |
## Quick start
|
11 |
|
12 |
### Presequities
|
@@ -43,6 +45,17 @@ outputs = model.generate(**input_ids, max_new_tokens=100)
|
|
43 |
print(tokenizer.decode(outputs[0]))
|
44 |
```
|
45 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
46 |
## Notice
|
47 |
|
48 |
Zamba is a pretrained base model and therefore does not have any moderation mechanism. In addition, one should not expect good chat performance, as this model was not fine-tuned for chat.
|
|
|
7 |
|
8 |
Note: the current Huggingface implementation of Zamba performs slower than our internal implementation. We are working to fix this with the Huggingface team.
|
9 |
|
10 |
+
Our technical report describing the training of Zamba is available [here](https://arxiv.org/abs/2405.16712)
|
11 |
+
|
12 |
## Quick start
|
13 |
|
14 |
### Presequities
|
|
|
45 |
print(tokenizer.decode(outputs[0]))
|
46 |
```
|
47 |
|
48 |
+
## Citation
|
49 |
+
|
50 |
+
If you find Zamba useful in your work please cite it as:
|
51 |
+
|
52 |
+
@article{glorioso2024zamba,
|
53 |
+
title={Zamba: A Compact 7B SSM Hybrid Model},
|
54 |
+
author={Glorioso, Paolo and Anthony, Quentin and Tokpanov, Yury and Whittington, James and Pilault, Jonathan and Ibrahim, Adam and Millidge, Beren},
|
55 |
+
journal={arXiv preprint arXiv:2405.16712},
|
56 |
+
year={2024}
|
57 |
+
}
|
58 |
+
|
59 |
## Notice
|
60 |
|
61 |
Zamba is a pretrained base model and therefore does not have any moderation mechanism. In addition, one should not expect good chat performance, as this model was not fine-tuned for chat.
|