Text Generation
Transformers
Safetensors
zamba2
Inference Endpoints
pglo commited on
Commit
51d1471
·
verified ·
1 Parent(s): b274e6c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -12
README.md CHANGED
@@ -1,9 +1,9 @@
1
  ---
2
  license: apache-2.0
3
  ---
4
- # Model Card for Zamba v2 3B
5
 
6
- Zamba2-2.7B is a hybrid model between state-space models and transformers. It broadly follows the [Zamba architecture](https://huggingface.co/Zyphra/Zamba-7B-v1) which consists of a Mamba backbone alternating with shared transformer blocks. Zamba2-2.7B possesses three major improvements over Zamba1:
7
 
8
  1.) Mamba1 blocks have been replaced with Mamba2 blocks.
9
  2.) Instead of a single shared attention block, we utilize two shared attention blocks which are interleaved in an ABAB pattern through the network.
@@ -21,16 +21,8 @@ To download Zamba 3B, clone Zyphra's fork of transformers:
21
  1. `git clone https://github.com/Zyphra/transformers_zamba2.git`
22
  2. `cd transformers_zamba2`
23
  3. Install the repository: `pip install -e .`
24
- 4. `git clone https://github.com/Zyphra/zamba2_torch.git`
25
- 5. `cd zamba2_torch`
26
- 6. Install the repository: `pip install -e .`
27
 
28
 
29
- In order to run optimized Mamba2 implementations on a CUDA device, you need to install `mamba-ssm` and `causal-conv1d`:
30
- ```bash
31
- pip install mamba-ssm causal-conv1d
32
- ```
33
-
34
  You can run the model without using the optimized Mamba kernels, but it is **not** recommended as it will result in significantly higher latency.
35
 
36
  To run on CPU, please specify `use_mamba_kernels=False` when loading the model using ``AutoModelForCausalLM.from_pretrained``.
@@ -42,8 +34,8 @@ To run on CPU, please specify `use_mamba_kernels=False` when loading the model u
42
  from transformers import AutoTokenizer, AutoModelForCausalLM
43
  import torch
44
 
45
- tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba-3B-v2")
46
- model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba-3B-v2", device_map="auto", torch_dtype=torch.bfloat16)
47
 
48
  input_text = "A funny prompt would be "
49
  input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+ # Model Card for Zamba v2 2.7B
5
 
6
+ Zamba-2-2.7B is a hybrid model between state-space models and transformers. It broadly follows the [Zamba architecture](https://huggingface.co/Zyphra/Zamba-7B-v1) which consists of a Mamba backbone alternating with shared transformer blocks. Zamba-2-2.7B possesses three major improvements over Zamba1:
7
 
8
  1.) Mamba1 blocks have been replaced with Mamba2 blocks.
9
  2.) Instead of a single shared attention block, we utilize two shared attention blocks which are interleaved in an ABAB pattern through the network.
 
21
  1. `git clone https://github.com/Zyphra/transformers_zamba2.git`
22
  2. `cd transformers_zamba2`
23
  3. Install the repository: `pip install -e .`
 
 
 
24
 
25
 
 
 
 
 
 
26
  You can run the model without using the optimized Mamba kernels, but it is **not** recommended as it will result in significantly higher latency.
27
 
28
  To run on CPU, please specify `use_mamba_kernels=False` when loading the model using ``AutoModelForCausalLM.from_pretrained``.
 
34
  from transformers import AutoTokenizer, AutoModelForCausalLM
35
  import torch
36
 
37
+ tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B")
38
+ model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B", device_map="auto", torch_dtype=torch.bfloat16)
39
 
40
  input_text = "A funny prompt would be "
41
  input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")