amgadhasan
/

phi-2

Text Generation

mixformer-sequential

Model card Files Files and versions Community

phi-2 / README.md

amgadhasan's picture

Update README.md

77a6abc about 1 year ago

|

2.42 kB

	# Disclaimer
	I do NOT own this model. It belongs to its developer (Microsoft). See the license file for more details.

	# Overview
	This repo contains the parameters of phi-2, which is a large language model developed by Microsoft.

	# How to run
	This model requires 12.5 GB of vRAM in float32.

	Should take roughly 6.7 GB in float16.

	## 1. Setup
	install the needed libraries
	```bash
	pip install sentencepiece transformers accelerate einops
	```

	## 2. Download the model
	```python
	from huggingface_hub import snapshot_download
	model_path = snapshot_download(repo_id="amgadhasan/phi-2",repo_type="model", local_dir="./phi-2", local_dir_use_symlinks=False)
	```

	## 3. Load and run the model
	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

	# We need to trust remote code since this hasn't been integrated in transformers as of version 4.35
	model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", trust_remote_code=True)

	def generate(prompt: str, generation_params: dict = {"max_length":200})-> str :
	inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
	outputs = model.generate(inputs, generation_params)
	completion = tokenizer.batch_decode(outputs)[0]
	return completion

	result = generate(prompt)
	result
	```


	## float16
	To load this model in float16, use the following code:
	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

	# We need to trust remote code since this hasn't been integrated in transformers as of version 4.35
	# We need to set the torch dtype globally since this model class doesn't accept dtype as argument
	torch.set_default_dtype(torch.float16)
	model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", trust_remote_code=True)

	def generate(prompt: str, generation_params: dict = {"max_length":200})-> str :
	inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
	outputs = model.generate(inputs, generation_params)
	completion = tokenizer.batch_decode(outputs)[0]
	return completion

	result = generate(prompt)
	result
	```

	# Acknowledgments
	Special thanks to Microsoft for developing and releasing this mode. Also, special thanks to the huggingface team for hosting LLMs for free!