File size: 3,166 Bytes
402728f
 
 
 
 
 
 
 
 
9277059
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
402728f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
---
license: mit
language:
- en
tags:
- pytorch
- gpt
- gpt-2
---
# GPT-2 PyTorch



The original GPT-2 model weights from [https://openaipublic.blob.core.windows.net/gpt-2/models](https://openaipublic.blob.core.windows.net/gpt-2/models) converted from TensorFlow to PyTorch state dicts and PyTorch safetensors files.


 
## Usage

The section below explain how the model weights can be used.

 
### Setup

Install the [llms-from-scratch](https://pypi.org/project/llms-from-scratch/) PyPI package:

```python
pip install llms_from_scratch
```

Or copy the and paste the `GPTModel` class and dependencies from [GitHub](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch05/01_main-chapter-code/previous_chapters.py).


 
### Loading the model weights

The following shows how to load the weights into the 355M parameter model:

```python
import torch
from llms_from_scratch.ch04 import GPTModel

GPT_CONFIG_BASE = {
    "vocab_size": 50257,    # Vocabulary size
    "context_length": 1024, # Original context length
    "emb_dim": 768,         # Embedding dimension
    "n_heads": 12,          # Number of attention heads
    "n_layers": 12,         # Number of layers
    "drop_rate": 0.0,       # Dropout rate
    "qkv_bias": True        # Query-key-value bias
}

model_configs = {
    "gpt2-small (124M)": {"emb_dim": 768, "n_layers": 12, "n_heads": 12},
    "gpt2-medium (355M)": {"emb_dim": 1024, "n_layers": 24, "n_heads": 16},
    "gpt2-large (774M)": {"emb_dim": 1280, "n_layers": 36, "n_heads": 20},
    "gpt2-xl (1558M)": {"emb_dim": 1600, "n_layers": 48, "n_heads": 25},
}

model_name = "gpt2-medium (355M)"  # Example model name
NEW_CONFIG = GPT_CONFIG_BASE.copy()
NEW_CONFIG.update(model_configs[model_name])

model = GPTModel(NEW_CONFIG)

# Option A: state dict
model.load_state_dict(torch.load("gpt2-medium-355M.pth", weights_only=True));
model.eval();

# Option B: safetensors
# from safetensors.torch import load_file
# model.load_state_dict(load_file("gpt2-medium-355M.safetensors"))

model.eval();
```

To use the other models, simply replace the model names:

```
model_name = "gpt2-medium (355M)"
...
model.load_state_dict(torch.load("gpt2-medium-355M.pth"))
# or
model.load_state_dict(load_file("gpt2-medium-355M.safetensors"))
```

with the desired model names. For example:

```
model_name = "gpt2-small (124M)"
...
model.load_state_dict(torch.load("gpt2-small-124M.pth"))
# or
model.load_state_dict(load_file("gpt2-small-124M.safetensors"))
```


 
### Generating text

The following showcases how the model can then be used to generate text.

```python
import tiktoken
from llms_from_scratch.ch04 import generate_text_simple

tokenizer = tiktoken.get_encoding("gpt2")

prompt = "Ever effort moves"
enc_prompt = tokenizer.encode(prompt)
enc_prompt = torch.tensor([enc_prompt])

token_ids = generate_text_simple(
    model=model,
    idx=enc_prompt, 
    max_new_tokens=25, 
    context_size=NEW_CONFIG["context_length"]
)

output = tokenizer.decode(token_ids.squeeze().tolist())
print(output)
```

```
Ever effort moves the needle.

The first step is to understand the difference between a "good" and a "bad" goal.
```