BerenMillidge
commited on
Commit
•
630c127
1
Parent(s):
1970b61
Update README.md
Browse files
README.md
CHANGED
@@ -1,9 +1,9 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
-
# Model Card for Zamba
|
5 |
|
6 |
-
Zamba-7B-v1 is a hybrid between state-space
|
7 |
|
8 |
## Quick start
|
9 |
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
+
# Model Card for Zamba 7B
|
5 |
|
6 |
+
Zamba-7B-v1 is a hybrid model between Mamba, a state-space model, and transformers. It uses a mamba backbone with a shared transformer layer every 6 blocks. Zamba was trained using next-token prediction. It uses the Mistral v0.1 tokenizer. We came to this architecture after a series of ablations at small scales. Zamba-7B-v1 was pre-trained on 1T tokens of text and code data sourced from open web-datasets. Subsequently in a second phase, Zamba was annealed on a mixture of 50B high-quality tokens.
|
7 |
|
8 |
## Quick start
|
9 |
|