ontocord
/

phi-3-22b-128k

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

huu-ontocord commited on May 23, 2024

Commit

f80eb9e

•

1 Parent(s): e0ebba7

Update README.md

Files changed (1) hide show

README.md +5 -3

README.md CHANGED Viewed

@@ -6,7 +6,7 @@ license: mit
 The Phi-3-22b is a depth upsampled version of the 14b  [Phi-3-medium-128k-instruct](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct). We removed the bottom 8 layers of one copy of the 14b and the top 8 layers of another copy of the 14b model and stacked them. We plan to do continued pretraining to improve performance.
 Since this model has not been continued pretrained, the quality may vary.
 ```
 !pip install flash-attn --no-build-isolation
 !pip install peft bitsandbytes accelerate transformers
@@ -15,11 +15,13 @@ import torch
 tokenizer = AutoTokenizer.from_pretrained("ontocord/phi-3-22b", trust_remote_code=True)
 model = AutoModelForCausalLM.from_pretrained("ontocord/phi-3-22b",
     torch_dtype="auto", device_map="auto", trust_remote_code=True,  )
 with torch.no_grad():
   print(tokenizer.batch_decode(model.generate(**tokenizer("<|user|>\nHow to explain Internet for a medieval knight?<|end|>\n<|assistant|>\n", return_tensors="pt").to('cuda'), max_new_tokens=128), use_cache=True)[0])
 ```
 Will produce:
 ```
 <|user|> How to explain Internet for a medieval knight?<|end|><|assistant|> Ah, noble knight, let me attempt to explain this mystical realm known as the Internet in terms that might resonate with your medieval understanding.

 The Phi-3-22b is a depth upsampled version of the 14b  [Phi-3-medium-128k-instruct](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct). We removed the bottom 8 layers of one copy of the 14b and the top 8 layers of another copy of the 14b model and stacked them. We plan to do continued pretraining to improve performance.
 Since this model has not been continued pretrained, the quality may vary.
+Loading:
 ```
 !pip install flash-attn --no-build-isolation
 !pip install peft bitsandbytes accelerate transformers
 tokenizer = AutoTokenizer.from_pretrained("ontocord/phi-3-22b", trust_remote_code=True)
 model = AutoModelForCausalLM.from_pretrained("ontocord/phi-3-22b",
     torch_dtype="auto", device_map="auto", trust_remote_code=True,  )
+```
+Basic test
+```
 with torch.no_grad():
   print(tokenizer.batch_decode(model.generate(**tokenizer("<|user|>\nHow to explain Internet for a medieval knight?<|end|>\n<|assistant|>\n", return_tensors="pt").to('cuda'), max_new_tokens=128), use_cache=True)[0])
 ```
 Will produce:
 ```
 <|user|> How to explain Internet for a medieval knight?<|end|><|assistant|> Ah, noble knight, let me attempt to explain this mystical realm known as the Internet in terms that might resonate with your medieval understanding.