File size: 3,166 Bytes
402728f 9277059 402728f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
---
license: mit
language:
- en
tags:
- pytorch
- gpt
- gpt-2
---
# GPT-2 PyTorch
The original GPT-2 model weights from [https://openaipublic.blob.core.windows.net/gpt-2/models](https://openaipublic.blob.core.windows.net/gpt-2/models) converted from TensorFlow to PyTorch state dicts and PyTorch safetensors files.
## Usage
The section below explain how the model weights can be used.
### Setup
Install the [llms-from-scratch](https://pypi.org/project/llms-from-scratch/) PyPI package:
```python
pip install llms_from_scratch
```
Or copy the and paste the `GPTModel` class and dependencies from [GitHub](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch05/01_main-chapter-code/previous_chapters.py).
### Loading the model weights
The following shows how to load the weights into the 355M parameter model:
```python
import torch
from llms_from_scratch.ch04 import GPTModel
GPT_CONFIG_BASE = {
"vocab_size": 50257, # Vocabulary size
"context_length": 1024, # Original context length
"emb_dim": 768, # Embedding dimension
"n_heads": 12, # Number of attention heads
"n_layers": 12, # Number of layers
"drop_rate": 0.0, # Dropout rate
"qkv_bias": True # Query-key-value bias
}
model_configs = {
"gpt2-small (124M)": {"emb_dim": 768, "n_layers": 12, "n_heads": 12},
"gpt2-medium (355M)": {"emb_dim": 1024, "n_layers": 24, "n_heads": 16},
"gpt2-large (774M)": {"emb_dim": 1280, "n_layers": 36, "n_heads": 20},
"gpt2-xl (1558M)": {"emb_dim": 1600, "n_layers": 48, "n_heads": 25},
}
model_name = "gpt2-medium (355M)" # Example model name
NEW_CONFIG = GPT_CONFIG_BASE.copy()
NEW_CONFIG.update(model_configs[model_name])
model = GPTModel(NEW_CONFIG)
# Option A: state dict
model.load_state_dict(torch.load("gpt2-medium-355M.pth", weights_only=True));
model.eval();
# Option B: safetensors
# from safetensors.torch import load_file
# model.load_state_dict(load_file("gpt2-medium-355M.safetensors"))
model.eval();
```
To use the other models, simply replace the model names:
```
model_name = "gpt2-medium (355M)"
...
model.load_state_dict(torch.load("gpt2-medium-355M.pth"))
# or
model.load_state_dict(load_file("gpt2-medium-355M.safetensors"))
```
with the desired model names. For example:
```
model_name = "gpt2-small (124M)"
...
model.load_state_dict(torch.load("gpt2-small-124M.pth"))
# or
model.load_state_dict(load_file("gpt2-small-124M.safetensors"))
```
### Generating text
The following showcases how the model can then be used to generate text.
```python
import tiktoken
from llms_from_scratch.ch04 import generate_text_simple
tokenizer = tiktoken.get_encoding("gpt2")
prompt = "Ever effort moves"
enc_prompt = tokenizer.encode(prompt)
enc_prompt = torch.tensor([enc_prompt])
token_ids = generate_text_simple(
model=model,
idx=enc_prompt,
max_new_tokens=25,
context_size=NEW_CONFIG["context_length"]
)
output = tokenizer.decode(token_ids.squeeze().tolist())
print(output)
```
```
Ever effort moves the needle.
The first step is to understand the difference between a "good" and a "bad" goal.
``` |