rasbt
/

gpt2-from-scratch-pytorch

Model card Files Files and versions Community

gpt2-from-scratch-pytorch / README.md

rasbt's picture

Update README.md

402728f verified 24 days ago

|

history blame contribute delete

3.17 kB

	---
	license: mit
	language:
	- en
	tags:
	- pytorch
	- gpt
	- gpt-2
	---
	# GPT-2 PyTorch



	The original GPT-2 model weights from [https://openaipublic.blob.core.windows.net/gpt-2/models](https://openaipublic.blob.core.windows.net/gpt-2/models) converted from TensorFlow to PyTorch state dicts and PyTorch safetensors files.



	## Usage

	The section below explain how the model weights can be used.


	### Setup

	Install the [llms-from-scratch](https://pypi.org/project/llms-from-scratch/) PyPI package:

	```python
	pip install llms_from_scratch
	```

	Or copy the and paste the `GPTModel` class and dependencies from [GitHub](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch05/01_main-chapter-code/previous_chapters.py).



	### Loading the model weights

	The following shows how to load the weights into the 355M parameter model:

	```python
	import torch
	from llms_from_scratch.ch04 import GPTModel

	GPT_CONFIG_BASE = {
	"vocab_size": 50257, # Vocabulary size
	"context_length": 1024, # Original context length
	"emb_dim": 768, # Embedding dimension
	"n_heads": 12, # Number of attention heads
	"n_layers": 12, # Number of layers
	"drop_rate": 0.0, # Dropout rate
	"qkv_bias": True # Query-key-value bias
	}

	model_configs = {
	"gpt2-small (124M)": {"emb_dim": 768, "n_layers": 12, "n_heads": 12},
	"gpt2-medium (355M)": {"emb_dim": 1024, "n_layers": 24, "n_heads": 16},
	"gpt2-large (774M)": {"emb_dim": 1280, "n_layers": 36, "n_heads": 20},
	"gpt2-xl (1558M)": {"emb_dim": 1600, "n_layers": 48, "n_heads": 25},
	}

	model_name = "gpt2-medium (355M)" # Example model name
	NEW_CONFIG = GPT_CONFIG_BASE.copy()
	NEW_CONFIG.update(model_configs[model_name])

	model = GPTModel(NEW_CONFIG)

	# Option A: state dict
	model.load_state_dict(torch.load("gpt2-medium-355M.pth", weights_only=True));
	model.eval();

	# Option B: safetensors
	# from safetensors.torch import load_file
	# model.load_state_dict(load_file("gpt2-medium-355M.safetensors"))

	model.eval();
	```

	To use the other models, simply replace the model names:

	```
	model_name = "gpt2-medium (355M)"
	...
	model.load_state_dict(torch.load("gpt2-medium-355M.pth"))
	# or
	model.load_state_dict(load_file("gpt2-medium-355M.safetensors"))
	```

	with the desired model names. For example:

	```
	model_name = "gpt2-small (124M)"
	...
	model.load_state_dict(torch.load("gpt2-small-124M.pth"))
	# or
	model.load_state_dict(load_file("gpt2-small-124M.safetensors"))
	```



	### Generating text

	The following showcases how the model can then be used to generate text.

	```python
	import tiktoken
	from llms_from_scratch.ch04 import generate_text_simple

	tokenizer = tiktoken.get_encoding("gpt2")

	prompt = "Ever effort moves"
	enc_prompt = tokenizer.encode(prompt)
	enc_prompt = torch.tensor([enc_prompt])

	token_ids = generate_text_simple(
	model=model,
	idx=enc_prompt,
	max_new_tokens=25,
	context_size=NEW_CONFIG["context_length"]
	)

	output = tokenizer.decode(token_ids.squeeze().tolist())
	print(output)
	```

	```
	Ever effort moves the needle.

	The first step is to understand the difference between a "good" and a "bad" goal.
	```