amgadhasan commited on
Commit
20ba04f
1 Parent(s): 9d46b83

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -0
README.md ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Disclaimer
2
+ I do **NOT** own this model. It belongs to its developer (Microsoft). See the license file for more details.
3
+
4
+ # Overview
5
+ This repo contains the parameters of phi-2, which is a large language model developed by Microsoft.
6
+
7
+ # How to run
8
+ This model requires 12.5 GB of vRAM in float32. Should take roughly half of this in float16.
9
+
10
+ ## 1. Setup
11
+ install the needed libraries
12
+ ```bash
13
+ pip install sentencepiece transformers accelerate einops
14
+ ```
15
+
16
+ ## 2. Download the model
17
+ ```python
18
+ from huggingface_hub import snapshot_download
19
+ model_path = snapshot_download(repo_id="amgadhasan/phi-2",repo_type="model", local_dir="./phi-2", local_dir_use_symlinks=False)
20
+ ```
21
+
22
+ ## 3. Load and run the model
23
+ ```python
24
+ import torch
25
+ from transformers import AutoModelForCausalLM, AutoTokenizer
26
+
27
+ tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
28
+
29
+ # We need to trust remote code since this hasn't been integrated in transformers as of version 4.35
30
+ model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", trust_remote_code=True)
31
+
32
+ def generate(prompt: str, generation_params: dict = {"max_length":200})-> str :
33
+ inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
34
+ outputs = model.generate(**inputs, **generation_params)
35
+ completion = tokenizer.batch_decode(outputs)[0]
36
+ return completion
37
+
38
+ result = generate(prompt)
39
+ result
40
+ ```
41
+
42
+
43
+ ## float16
44
+ To load this model in float16, use the following code:
45
+ ```python
46
+ import torch
47
+ from transformers import AutoModelForCausalLM, AutoTokenizer
48
+
49
+ tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
50
+
51
+ # We need to trust remote code since this hasn't been integrated in transformers as of version 4.35
52
+ # We need to set the torch dtype globally since this model class doesn't accept dtype as argument
53
+ torch.set_default_dtype(torch.float16)
54
+ model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", trust_remote_code=True)
55
+
56
+ def generate(prompt: str, generation_params: dict = {"max_length":200})-> str :
57
+ inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
58
+ outputs = model.generate(**inputs, **generation_params)
59
+ completion = tokenizer.batch_decode(outputs)[0]
60
+ return completion
61
+
62
+ result = generate(prompt)
63
+ result
64
+ ```
65
+
66
+ # Acknowledgments
67
+ Special thanks to Microsoft for developing and releasing this mode. Also, special thanks to the huggingface team for hosting LLMs for free!