Zyphra
/

Zamba-7B-v1

Text Generation

Inference Endpoints

Model card Files Files and versions Community

pglo commited on May 22

Commit

360d2b3

•

1 Parent(s): 625e567

Update README.md

Files changed (1) hide show

README.md +4 -1

README.md CHANGED Viewed

@@ -19,7 +19,10 @@ In order to run optimized Mamba implementations on a CUDA device, you first need
 pip install mamba-ssm causal-conv1d>=1.2.0
 ```
-You can run the model not using the optimized Mamba kernels, but it is **not** recommended as it will result in significantly higher latency. In order to do that, you'll need to specify `use_mamba_kernels=False` when loading the model.
 ## Inference

 pip install mamba-ssm causal-conv1d>=1.2.0
 ```
+You can run the model not using the optimized Mamba kernels, but it is **not** recommended as it will result in significantly higher latency.
+To run on CPU, please specify `use_mamba_kernels=False` when loading the model using ``AutoModelForCausalLM.from_pretrained``.
 ## Inference