File size: 2,417 Bytes
20ba04f 77a6abc 20ba04f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
# Disclaimer
I do **NOT** own this model. It belongs to its developer (Microsoft). See the license file for more details.
# Overview
This repo contains the parameters of phi-2, which is a large language model developed by Microsoft.
# How to run
This model requires 12.5 GB of vRAM in float32.
Should take roughly 6.7 GB in float16.
## 1. Setup
install the needed libraries
```bash
pip install sentencepiece transformers accelerate einops
```
## 2. Download the model
```python
from huggingface_hub import snapshot_download
model_path = snapshot_download(repo_id="amgadhasan/phi-2",repo_type="model", local_dir="./phi-2", local_dir_use_symlinks=False)
```
## 3. Load and run the model
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
# We need to trust remote code since this hasn't been integrated in transformers as of version 4.35
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", trust_remote_code=True)
def generate(prompt: str, generation_params: dict = {"max_length":200})-> str :
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, **generation_params)
completion = tokenizer.batch_decode(outputs)[0]
return completion
result = generate(prompt)
result
```
## float16
To load this model in float16, use the following code:
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
# We need to trust remote code since this hasn't been integrated in transformers as of version 4.35
# We need to set the torch dtype globally since this model class doesn't accept dtype as argument
torch.set_default_dtype(torch.float16)
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", trust_remote_code=True)
def generate(prompt: str, generation_params: dict = {"max_length":200})-> str :
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, **generation_params)
completion = tokenizer.batch_decode(outputs)[0]
return completion
result = generate(prompt)
result
```
# Acknowledgments
Special thanks to Microsoft for developing and releasing this mode. Also, special thanks to the huggingface team for hosting LLMs for free! |