|
--- |
|
license: apache-2.0 |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
tags: |
|
- 8bit |
|
- sharded |
|
- open_llama |
|
inference: False |
|
--- |
|
|
|
# open_llama_13b-sharded-8bit |
|
|
|
<a href="https://colab.research.google.com/gist/pszemraj/166ad661c6af1e024d4e2897621fc886/open_llama_13b-sharded-8bit-example.ipynb"> |
|
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> |
|
</a> |
|
|
|
This is [open_llama_13b](https://huggingface.co/openlm-research/open_llama_13b) sharded into 2 GB shards, and in 8-bit precision using `bitsandbytes==0.38.0`. Please refer to the original model card for details. |
|
|
|
|
|
|
|
## loading |
|
|
|
```sh |
|
pip install -U -q sentencepiece transformers accelerate bitsandbytes |
|
``` |
|
|
|
load the model and tokenizer: |
|
|
|
```python |
|
import torch |
|
from transformers import LlamaTokenizer, LlamaForCausalLM |
|
|
|
model_name = "ethzanalytics/open_llama_13b-sharded-8bit" |
|
tokenizer = LlamaTokenizer.from_pretrained(model_name, use_fast=False) |
|
model = LlamaForCausalLM.from_pretrained( |
|
model_name, |
|
load_in_8bit=True, |
|
device_map="auto", |
|
) |
|
``` |