|
--- |
|
license: mit |
|
language: |
|
- en |
|
tags: |
|
- pytorch |
|
- gpt |
|
- gpt-2 |
|
--- |
|
# GPT-2 PyTorch |
|
|
|
|
|
|
|
The original GPT-2 model weights from [https://openaipublic.blob.core.windows.net/gpt-2/models](https://openaipublic.blob.core.windows.net/gpt-2/models) converted from TensorFlow to PyTorch state dicts and PyTorch safetensors files. |
|
|
|
|
|
|
|
## Usage |
|
|
|
The section below explain how the model weights can be used. |
|
|
|
|
|
### Setup |
|
|
|
Install the [llms-from-scratch](https://pypi.org/project/llms-from-scratch/) PyPI package: |
|
|
|
```python |
|
pip install llms_from_scratch |
|
``` |
|
|
|
Or copy the and paste the `GPTModel` class and dependencies from [GitHub](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch05/01_main-chapter-code/previous_chapters.py). |
|
|
|
|
|
|
|
### Loading the model weights |
|
|
|
The following shows how to load the weights into the 355M parameter model: |
|
|
|
```python |
|
import torch |
|
from llms_from_scratch.ch04 import GPTModel |
|
|
|
GPT_CONFIG_BASE = { |
|
"vocab_size": 50257, # Vocabulary size |
|
"context_length": 1024, # Original context length |
|
"emb_dim": 768, # Embedding dimension |
|
"n_heads": 12, # Number of attention heads |
|
"n_layers": 12, # Number of layers |
|
"drop_rate": 0.0, # Dropout rate |
|
"qkv_bias": True # Query-key-value bias |
|
} |
|
|
|
model_configs = { |
|
"gpt2-small (124M)": {"emb_dim": 768, "n_layers": 12, "n_heads": 12}, |
|
"gpt2-medium (355M)": {"emb_dim": 1024, "n_layers": 24, "n_heads": 16}, |
|
"gpt2-large (774M)": {"emb_dim": 1280, "n_layers": 36, "n_heads": 20}, |
|
"gpt2-xl (1558M)": {"emb_dim": 1600, "n_layers": 48, "n_heads": 25}, |
|
} |
|
|
|
model_name = "gpt2-medium (355M)" # Example model name |
|
NEW_CONFIG = GPT_CONFIG_BASE.copy() |
|
NEW_CONFIG.update(model_configs[model_name]) |
|
|
|
model = GPTModel(NEW_CONFIG) |
|
|
|
# Option A: state dict |
|
model.load_state_dict(torch.load("gpt2-medium-355M.pth", weights_only=True)); |
|
model.eval(); |
|
|
|
# Option B: safetensors |
|
# from safetensors.torch import load_file |
|
# model.load_state_dict(load_file("gpt2-medium-355M.safetensors")) |
|
|
|
model.eval(); |
|
``` |
|
|
|
To use the other models, simply replace the model names: |
|
|
|
``` |
|
model_name = "gpt2-medium (355M)" |
|
... |
|
model.load_state_dict(torch.load("gpt2-medium-355M.pth")) |
|
# or |
|
model.load_state_dict(load_file("gpt2-medium-355M.safetensors")) |
|
``` |
|
|
|
with the desired model names. For example: |
|
|
|
``` |
|
model_name = "gpt2-small (124M)" |
|
... |
|
model.load_state_dict(torch.load("gpt2-small-124M.pth")) |
|
# or |
|
model.load_state_dict(load_file("gpt2-small-124M.safetensors")) |
|
``` |
|
|
|
|
|
|
|
### Generating text |
|
|
|
The following showcases how the model can then be used to generate text. |
|
|
|
```python |
|
import tiktoken |
|
from llms_from_scratch.ch04 import generate_text_simple |
|
|
|
tokenizer = tiktoken.get_encoding("gpt2") |
|
|
|
prompt = "Ever effort moves" |
|
enc_prompt = tokenizer.encode(prompt) |
|
enc_prompt = torch.tensor([enc_prompt]) |
|
|
|
token_ids = generate_text_simple( |
|
model=model, |
|
idx=enc_prompt, |
|
max_new_tokens=25, |
|
context_size=NEW_CONFIG["context_length"] |
|
) |
|
|
|
output = tokenizer.decode(token_ids.squeeze().tolist()) |
|
print(output) |
|
``` |
|
|
|
``` |
|
Ever effort moves the needle. |
|
|
|
The first step is to understand the difference between a "good" and a "bad" goal. |
|
``` |