Update README.md
Browse files
README.md
CHANGED
@@ -1,9 +1,9 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
-
# Model Card for Zamba v2
|
5 |
|
6 |
-
|
7 |
|
8 |
1.) Mamba1 blocks have been replaced with Mamba2 blocks.
|
9 |
2.) Instead of a single shared attention block, we utilize two shared attention blocks which are interleaved in an ABAB pattern through the network.
|
@@ -21,16 +21,8 @@ To download Zamba 3B, clone Zyphra's fork of transformers:
|
|
21 |
1. `git clone https://github.com/Zyphra/transformers_zamba2.git`
|
22 |
2. `cd transformers_zamba2`
|
23 |
3. Install the repository: `pip install -e .`
|
24 |
-
4. `git clone https://github.com/Zyphra/zamba2_torch.git`
|
25 |
-
5. `cd zamba2_torch`
|
26 |
-
6. Install the repository: `pip install -e .`
|
27 |
|
28 |
|
29 |
-
In order to run optimized Mamba2 implementations on a CUDA device, you need to install `mamba-ssm` and `causal-conv1d`:
|
30 |
-
```bash
|
31 |
-
pip install mamba-ssm causal-conv1d
|
32 |
-
```
|
33 |
-
|
34 |
You can run the model without using the optimized Mamba kernels, but it is **not** recommended as it will result in significantly higher latency.
|
35 |
|
36 |
To run on CPU, please specify `use_mamba_kernels=False` when loading the model using ``AutoModelForCausalLM.from_pretrained``.
|
@@ -42,8 +34,8 @@ To run on CPU, please specify `use_mamba_kernels=False` when loading the model u
|
|
42 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
43 |
import torch
|
44 |
|
45 |
-
tokenizer = AutoTokenizer.from_pretrained("Zyphra/
|
46 |
-
model = AutoModelForCausalLM.from_pretrained("Zyphra/
|
47 |
|
48 |
input_text = "A funny prompt would be "
|
49 |
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
+
# Model Card for Zamba v2 2.7B
|
5 |
|
6 |
+
Zamba-2-2.7B is a hybrid model between state-space models and transformers. It broadly follows the [Zamba architecture](https://huggingface.co/Zyphra/Zamba-7B-v1) which consists of a Mamba backbone alternating with shared transformer blocks. Zamba-2-2.7B possesses three major improvements over Zamba1:
|
7 |
|
8 |
1.) Mamba1 blocks have been replaced with Mamba2 blocks.
|
9 |
2.) Instead of a single shared attention block, we utilize two shared attention blocks which are interleaved in an ABAB pattern through the network.
|
|
|
21 |
1. `git clone https://github.com/Zyphra/transformers_zamba2.git`
|
22 |
2. `cd transformers_zamba2`
|
23 |
3. Install the repository: `pip install -e .`
|
|
|
|
|
|
|
24 |
|
25 |
|
|
|
|
|
|
|
|
|
|
|
26 |
You can run the model without using the optimized Mamba kernels, but it is **not** recommended as it will result in significantly higher latency.
|
27 |
|
28 |
To run on CPU, please specify `use_mamba_kernels=False` when loading the model using ``AutoModelForCausalLM.from_pretrained``.
|
|
|
34 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
35 |
import torch
|
36 |
|
37 |
+
tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B")
|
38 |
+
model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B", device_map="auto", torch_dtype=torch.bfloat16)
|
39 |
|
40 |
input_text = "A funny prompt would be "
|
41 |
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
|