Zyphra
/

Zamba2-2.7B

Text Generation

Model card Files Files and versions Community

pglo commited on Jul 20, 2024

Commit

51d1471

·

verified ·

1 Parent(s): b274e6c

Update README.md

Files changed (1) hide show

README.md +4 -12

README.md CHANGED Viewed

@@ -1,9 +1,9 @@
 ---
 license: apache-2.0
 ---
-# Model Card for Zamba v2 3B
-Zamba2-2.7B is a hybrid model between state-space models and transformers. It broadly follows the [Zamba architecture](https://huggingface.co/Zyphra/Zamba-7B-v1) which consists of a Mamba backbone alternating with shared transformer blocks. Zamba2-2.7B possesses three major improvements over Zamba1:
 1.) Mamba1 blocks have been replaced with Mamba2 blocks.
 2.) Instead of a single shared attention block, we utilize two shared attention blocks which are interleaved in an ABAB pattern through the network.
@@ -21,16 +21,8 @@ To download Zamba 3B, clone Zyphra's fork of transformers:
 1. `git clone https://github.com/Zyphra/transformers_zamba2.git`
 2. `cd transformers_zamba2`
 3. Install the repository: `pip install -e .`
-4. `git clone https://github.com/Zyphra/zamba2_torch.git`
-5. `cd zamba2_torch`
-6. Install the repository: `pip install -e .`
-In order to run optimized Mamba2 implementations on a CUDA device, you need to install `mamba-ssm` and `causal-conv1d`:
-```bash
-pip install mamba-ssm causal-conv1d
-```
 You can run the model without using the optimized Mamba kernels, but it is **not** recommended as it will result in significantly higher latency.
 To run on CPU, please specify `use_mamba_kernels=False` when loading the model using ``AutoModelForCausalLM.from_pretrained``.
@@ -42,8 +34,8 @@ To run on CPU, please specify `use_mamba_kernels=False` when loading the model u
 from transformers import AutoTokenizer, AutoModelForCausalLM
 import torch
-tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba-3B-v2")
-model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba-3B-v2", device_map="auto", torch_dtype=torch.bfloat16)
 input_text = "A funny prompt would be "
 input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

 ---
 license: apache-2.0
 ---
+# Model Card for Zamba v2 2.7B
+Zamba-2-2.7B is a hybrid model between state-space models and transformers. It broadly follows the [Zamba architecture](https://huggingface.co/Zyphra/Zamba-7B-v1) which consists of a Mamba backbone alternating with shared transformer blocks. Zamba-2-2.7B possesses three major improvements over Zamba1:
 1.) Mamba1 blocks have been replaced with Mamba2 blocks.
 2.) Instead of a single shared attention block, we utilize two shared attention blocks which are interleaved in an ABAB pattern through the network.
 1. `git clone https://github.com/Zyphra/transformers_zamba2.git`
 2. `cd transformers_zamba2`
 3. Install the repository: `pip install -e .`
 You can run the model without using the optimized Mamba kernels, but it is **not** recommended as it will result in significantly higher latency.
 To run on CPU, please specify `use_mamba_kernels=False` when loading the model using ``AutoModelForCausalLM.from_pretrained``.
 from transformers import AutoTokenizer, AutoModelForCausalLM
 import torch
+tokenizer = AutoTokenizer.from_pretrained("Zyphra/Zamba2-2.7B")
+model = AutoModelForCausalLM.from_pretrained("Zyphra/Zamba2-2.7B", device_map="auto", torch_dtype=torch.bfloat16)
 input_text = "A funny prompt would be "
 input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")