---
license: apache-2.0
---
# Model Card for Zamba 7B

Zamba-7B-v1 is a hybrid model between Mamba, a state-space model, and transformers. It uses a mamba backbone with a shared transformer layer every 6 blocks. Zamba was trained using next-token prediction. It uses the Mistral v0.1 tokenizer. We came to this architecture after a series of ablations at small scales. Zamba-7B-v1 was pre-trained on 1T tokens of text and code data sourced from open web-datasets. Subsequently in a second phase, Zamba was annealed on a mixture of 50B high-quality tokens.

Note: the current Huggingface implementation of Zamba performs slower than our internal implementation. We are working to fix this with the Huggingface team.

Our technical report describing the training of Zamba is available [here](https://arxiv.org/abs/2405.16712)

## Quick start

### Presequities

To download Zamba, clone Zyphra's fork of transformers:
1. `git clone https://github.com/Zyphra/transformers_zamba`
2. `cd transformers_zamba`
3. Install the repository: `pip install -e .`


In order to run optimized Mamba implementations on a CUDA device, you need to install `mamba-ssm` and `causal-conv1d`:
```bash
pip install mamba-ssm causal-conv1d>=1.2.0
```

You can run the model without using the optimized Mamba kernels, but it is **not** recommended as it will result in significantly higher latency. 

To run on CPU, please specify `use_mamba_kernels=False` when loading the model using ``AutoModelForCausalLM.from_pretrained``.


### Inference

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba-7B-v1")
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba-7B-v1", device_map="auto", torch_dtype=torch.bfloat16)

input_text = "A funny prompt would be "
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))
```

## Citation

If you find Zamba useful in your work please cite it as:

@article{glorioso2024zamba,
  title={Zamba: A Compact 7B SSM Hybrid Model},
  author={Glorioso, Paolo and Anthony, Quentin and Tokpanov, Yury and Whittington, James and Pilault, Jonathan and Ibrahim, Adam and Millidge, Beren},
  journal={arXiv preprint arXiv:2405.16712},
  year={2024}
}

## Notice

Zamba is a pretrained base model and therefore does not have any moderation mechanism. In addition, one should not expect good chat performance, as this model was not fine-tuned for chat.