huu-ontocord commited on
Commit
f80eb9e
1 Parent(s): e0ebba7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -6,7 +6,7 @@ license: mit
6
 
7
  The Phi-3-22b is a depth upsampled version of the 14b [Phi-3-medium-128k-instruct](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct). We removed the bottom 8 layers of one copy of the 14b and the top 8 layers of another copy of the 14b model and stacked them. We plan to do continued pretraining to improve performance.
8
  Since this model has not been continued pretrained, the quality may vary.
9
-
10
  ```
11
  !pip install flash-attn --no-build-isolation
12
  !pip install peft bitsandbytes accelerate transformers
@@ -15,11 +15,13 @@ import torch
15
  tokenizer = AutoTokenizer.from_pretrained("ontocord/phi-3-22b", trust_remote_code=True)
16
  model = AutoModelForCausalLM.from_pretrained("ontocord/phi-3-22b",
17
  torch_dtype="auto", device_map="auto", trust_remote_code=True, )
 
 
 
 
18
  with torch.no_grad():
19
  print(tokenizer.batch_decode(model.generate(**tokenizer("<|user|>\nHow to explain Internet for a medieval knight?<|end|>\n<|assistant|>\n", return_tensors="pt").to('cuda'), max_new_tokens=128), use_cache=True)[0])
20
-
21
  ```
22
-
23
  Will produce:
24
  ```
25
  <|user|> How to explain Internet for a medieval knight?<|end|><|assistant|> Ah, noble knight, let me attempt to explain this mystical realm known as the Internet in terms that might resonate with your medieval understanding.
 
6
 
7
  The Phi-3-22b is a depth upsampled version of the 14b [Phi-3-medium-128k-instruct](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct). We removed the bottom 8 layers of one copy of the 14b and the top 8 layers of another copy of the 14b model and stacked them. We plan to do continued pretraining to improve performance.
8
  Since this model has not been continued pretrained, the quality may vary.
9
+ Loading:
10
  ```
11
  !pip install flash-attn --no-build-isolation
12
  !pip install peft bitsandbytes accelerate transformers
 
15
  tokenizer = AutoTokenizer.from_pretrained("ontocord/phi-3-22b", trust_remote_code=True)
16
  model = AutoModelForCausalLM.from_pretrained("ontocord/phi-3-22b",
17
  torch_dtype="auto", device_map="auto", trust_remote_code=True, )
18
+
19
+ ```
20
+ Basic test
21
+ ```
22
  with torch.no_grad():
23
  print(tokenizer.batch_decode(model.generate(**tokenizer("<|user|>\nHow to explain Internet for a medieval knight?<|end|>\n<|assistant|>\n", return_tensors="pt").to('cuda'), max_new_tokens=128), use_cache=True)[0])
 
24
  ```
 
25
  Will produce:
26
  ```
27
  <|user|> How to explain Internet for a medieval knight?<|end|><|assistant|> Ah, noble knight, let me attempt to explain this mystical realm known as the Internet in terms that might resonate with your medieval understanding.