ontocord
/

phi-3-22b-128k

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

huu-ontocord commited on May 22, 2024

Commit

55b8808

•

1 Parent(s): 589cdc3

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -4,14 +4,14 @@ license: mit
 ## Model Summary
-The Phi-3-18.5b is a depth upsampled version of the 14b  [Phi-3-medium-128k-instruct](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct). We removed the bottom 8 layers of one copy of the 14b and the top 8 layers of another copy of the 14b model and stacked them. We plan to do continued pretraining to improve performance.
 Since this model has not been continued pretrained, the quality may vary.
 ```
 !pip intsall transformers accelerate
 from transformers import AutoTokenizer, AutoModelForCausalLM
 import torch
-tokenizer = AutoTokenizer.from_pretrained("ontocord/phi-3-18.5b", trust_remote_code=True)
-model = AutoModelForCausalLM.from_pretrained("ontocord/phi-3-18.5b",
     torch_dtype="auto", device_map="auto", trust_remote_code=True,  )
 with torch.no_grad():
   print(tokenizer.batch_decode(model.generate(**tokenizer("<|user|>\nHow to explain Internet for a medieval knight?<|end|>\n<|assistant|>\n", return_tensors="pt").to('cuda'), max_new_tokens=128), use_cache=True)[0])

 ## Model Summary
+The Phi-3-22b is a depth upsampled version of the 14b  [Phi-3-medium-128k-instruct](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct). We removed the bottom 8 layers of one copy of the 14b and the top 8 layers of another copy of the 14b model and stacked them. We plan to do continued pretraining to improve performance.
 Since this model has not been continued pretrained, the quality may vary.
 ```
 !pip intsall transformers accelerate
 from transformers import AutoTokenizer, AutoModelForCausalLM
 import torch
+tokenizer = AutoTokenizer.from_pretrained("ontocord/phi-3-22b", trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained("ontocord/phi-3-22b",
     torch_dtype="auto", device_map="auto", trust_remote_code=True,  )
 with torch.no_grad():
   print(tokenizer.batch_decode(model.generate(**tokenizer("<|user|>\nHow to explain Internet for a medieval knight?<|end|>\n<|assistant|>\n", return_tensors="pt").to('cuda'), max_new_tokens=128), use_cache=True)[0])