--- license: mit --- ## Model Summary The Phi-3-22b is a depth upsampled version of the 14b [Phi-3-medium-128k-instruct](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct). We removed the bottom 8 layers of one copy of the 14b and the top 8 layers of another copy of the 14b model and stacked them. We plan to do continued pretraining to improve performance. Since this model has not been continued pretrained, the quality may vary. ``` !pip install transformers accelerate from transformers import AutoTokenizer, AutoModelForCausalLM import torch tokenizer = AutoTokenizer.from_pretrained("ontocord/phi-3-22b", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("ontocord/phi-3-22b", torch_dtype="auto", device_map="auto", trust_remote_code=True, ) with torch.no_grad(): print(tokenizer.batch_decode(model.generate(**tokenizer("<|user|>\nHow to explain Internet for a medieval knight?<|end|>\n<|assistant|>\n", return_tensors="pt").to('cuda'), max_new_tokens=128), use_cache=True)[0]) ``` Will produce: ``` <|user|> How to explain Internet for a medieval knight?<|end|><|assistant|> Ah, noble knight, let me attempt to explain this mystical realm known as the Internet in terms that might resonate with your medieval understanding. Imagine, if you will, a vast kingdom stretching beyond the horizon, where countless villages, towns, and cities are connected by a network of roads, bridges, and pathways. This kingdom is not bound by physical borders, but instead, it exists in a realm beyond our own, accessible only through magical devices known as computers, tablets, and smartph€™s. In this kingdom, information flows like a mighty river,... ``` In 4-bit ``` from transformers import AutoTokenizer, AutoModelForCausalLM import torch tokenizer = AutoTokenizer.from_pretrained("ontocord/phi-3-22b", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("ontocord/phi-3-22b", load_in_4bit=True, device_map="auto", trust_remote_code=True, ) with torch.no_grad(): print(tokenizer.batch_decode(model.generate(**tokenizer("<|user|>\nHow to explain Internet for a medieval knight?<|end|>\n<|assistant|>\n", return_tensors="pt").to('cuda'), max_new_tokens=128), use_cache=True)[0]) ``` Will produce: ``` <|user|> How to explain Internet for a medieval knight?<|end|><|assistant|> Ah, noble knight, let me attempt to explain this mystical network known as the Internet, using terms and analogies from your time. Imagine a vast kingdom, stretching far beyond the horizon, where countless villages, towns, and cities are connected by roads, rivers, and paths. Each village is like a castle, filled with people who share knowledge, goods, stories, and news. Now, imagine that instead of messengers, horses, or ships, there exists a magical network of invisible threads connecting all these villages. This network is invisible to the eye, yet it allows messages, scroll ``` ``` import torch with torch.no_grad(): print(tokenizer.batch_decode(model.generate(**tokenizer("<|user|>\nExplain why it is surprising that one can build a language model small enough to fit on a phone, yet almost as powerful as ChatGPT. Just use one funny sentence.<|end|>\n<|assistant|>\n", return_tensors="pt").to('cuda'), max_new_tokens=128), use_cache=True)[0]) ``` Will produce: ``` <|user|> Explain why it is surprising that one can build a language model small enough to fit on a phone, yet almost as powerful as ChatGPT. Just use one funny sentence.<|end|><|assistant|> "Who knew that fitting a ChatGPT rival in your pocket would be easier than fitting a penguin in a pocket-sized suit!"<|end|> ``` See the [Phi-3-medium-128k-instruct](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct) model card for more details.