huu-ontocord commited on
Commit
55b8808
1 Parent(s): 589cdc3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -4,14 +4,14 @@ license: mit
4
 
5
  ## Model Summary
6
 
7
- The Phi-3-18.5b is a depth upsampled version of the 14b [Phi-3-medium-128k-instruct](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct). We removed the bottom 8 layers of one copy of the 14b and the top 8 layers of another copy of the 14b model and stacked them. We plan to do continued pretraining to improve performance.
8
  Since this model has not been continued pretrained, the quality may vary.
9
  ```
10
  !pip intsall transformers accelerate
11
  from transformers import AutoTokenizer, AutoModelForCausalLM
12
  import torch
13
- tokenizer = AutoTokenizer.from_pretrained("ontocord/phi-3-18.5b", trust_remote_code=True)
14
- model = AutoModelForCausalLM.from_pretrained("ontocord/phi-3-18.5b",
15
  torch_dtype="auto", device_map="auto", trust_remote_code=True, )
16
  with torch.no_grad():
17
  print(tokenizer.batch_decode(model.generate(**tokenizer("<|user|>\nHow to explain Internet for a medieval knight?<|end|>\n<|assistant|>\n", return_tensors="pt").to('cuda'), max_new_tokens=128), use_cache=True)[0])
 
4
 
5
  ## Model Summary
6
 
7
+ The Phi-3-22b is a depth upsampled version of the 14b [Phi-3-medium-128k-instruct](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct). We removed the bottom 8 layers of one copy of the 14b and the top 8 layers of another copy of the 14b model and stacked them. We plan to do continued pretraining to improve performance.
8
  Since this model has not been continued pretrained, the quality may vary.
9
  ```
10
  !pip intsall transformers accelerate
11
  from transformers import AutoTokenizer, AutoModelForCausalLM
12
  import torch
13
+ tokenizer = AutoTokenizer.from_pretrained("ontocord/phi-3-22b", trust_remote_code=True)
14
+ model = AutoModelForCausalLM.from_pretrained("ontocord/phi-3-22b",
15
  torch_dtype="auto", device_map="auto", trust_remote_code=True, )
16
  with torch.no_grad():
17
  print(tokenizer.batch_decode(model.generate(**tokenizer("<|user|>\nHow to explain Internet for a medieval knight?<|end|>\n<|assistant|>\n", return_tensors="pt").to('cuda'), max_new_tokens=128), use_cache=True)[0])