Upload folder using huggingface_hub
Browse files- README.md +120 -0
- gpt2-large-774M.pth +3 -0
- gpt2-large-774M.safetensors +3 -0
- gpt2-medium-355M.pth +3 -0
- gpt2-medium-355M.safetensors +3 -0
- gpt2-small-124M.pth +3 -0
- gpt2-small-124M.safetensors +3 -0
- gpt2-xl-1558M.pth +3 -0
- gpt2-xl-1558M.safetensors +3 -0
README.md
ADDED
@@ -0,0 +1,120 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# GPT-2 PyTorch
|
2 |
+
|
3 |
+
|
4 |
+
|
5 |
+
The original GPT-2 model weights from [https://openaipublic.blob.core.windows.net/gpt-2/models](https://openaipublic.blob.core.windows.net/gpt-2/models) converted from TensorFlow to PyTorch state dicts and PyTorch safetensors files.
|
6 |
+
|
7 |
+
|
8 |
+
|
9 |
+
## Usage
|
10 |
+
|
11 |
+
The section below explain how the model weights can be used.
|
12 |
+
|
13 |
+
|
14 |
+
### Setup
|
15 |
+
|
16 |
+
Install the [llms-from-scratch](https://pypi.org/project/llms-from-scratch/) PyPI package:
|
17 |
+
|
18 |
+
```python
|
19 |
+
pip install llms_from_scratch
|
20 |
+
```
|
21 |
+
|
22 |
+
Or copy the and paste the `GPTModel` class and dependencies from [GitHub](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch05/01_main-chapter-code/previous_chapters.py).
|
23 |
+
|
24 |
+
|
25 |
+
|
26 |
+
### Loading the model weights
|
27 |
+
|
28 |
+
The following shows how to load the weights into the 355M parameter model:
|
29 |
+
|
30 |
+
```python
|
31 |
+
import torch
|
32 |
+
from llms_from_scratch.ch04 import GPTModel
|
33 |
+
|
34 |
+
GPT_CONFIG_BASE = {
|
35 |
+
"vocab_size": 50257, # Vocabulary size
|
36 |
+
"context_length": 1024, # Original context length
|
37 |
+
"emb_dim": 768, # Embedding dimension
|
38 |
+
"n_heads": 12, # Number of attention heads
|
39 |
+
"n_layers": 12, # Number of layers
|
40 |
+
"drop_rate": 0.0, # Dropout rate
|
41 |
+
"qkv_bias": True # Query-key-value bias
|
42 |
+
}
|
43 |
+
|
44 |
+
model_configs = {
|
45 |
+
"gpt2-small (124M)": {"emb_dim": 768, "n_layers": 12, "n_heads": 12},
|
46 |
+
"gpt2-medium (355M)": {"emb_dim": 1024, "n_layers": 24, "n_heads": 16},
|
47 |
+
"gpt2-large (774M)": {"emb_dim": 1280, "n_layers": 36, "n_heads": 20},
|
48 |
+
"gpt2-xl (1558M)": {"emb_dim": 1600, "n_layers": 48, "n_heads": 25},
|
49 |
+
}
|
50 |
+
|
51 |
+
model_name = "gpt2-medium (355M)" # Example model name
|
52 |
+
NEW_CONFIG = GPT_CONFIG_BASE.copy()
|
53 |
+
NEW_CONFIG.update(model_configs[model_name])
|
54 |
+
|
55 |
+
model = GPTModel(NEW_CONFIG)
|
56 |
+
|
57 |
+
# Option A: state dict
|
58 |
+
model.load_state_dict(torch.load("gpt2-medium-355M.pth", weights_only=True));
|
59 |
+
model.eval();
|
60 |
+
|
61 |
+
# Option B: safetensors
|
62 |
+
# from safetensors.torch import load_file
|
63 |
+
# model.load_state_dict(load_file("gpt2-medium-355M.safetensors"))
|
64 |
+
|
65 |
+
model.eval();
|
66 |
+
```
|
67 |
+
|
68 |
+
To use the other models, simply replace the model names:
|
69 |
+
|
70 |
+
```
|
71 |
+
model_name = "gpt2-medium (355M)"
|
72 |
+
...
|
73 |
+
model.load_state_dict(torch.load("gpt2-medium-355M.pth"))
|
74 |
+
# or
|
75 |
+
model.load_state_dict(load_file("gpt2-medium-355M.safetensors"))
|
76 |
+
```
|
77 |
+
|
78 |
+
with the desired model names. For example:
|
79 |
+
|
80 |
+
```
|
81 |
+
model_name = "gpt2-small (124M)"
|
82 |
+
...
|
83 |
+
model.load_state_dict(torch.load("gpt2-small-124M.pth"))
|
84 |
+
# or
|
85 |
+
model.load_state_dict(load_file("gpt2-small-124M.safetensors"))
|
86 |
+
```
|
87 |
+
|
88 |
+
|
89 |
+
|
90 |
+
### Generating text
|
91 |
+
|
92 |
+
The following showcases how the model can then be used to generate text.
|
93 |
+
|
94 |
+
```python
|
95 |
+
import tiktoken
|
96 |
+
from llms_from_scratch.ch04 import generate_text_simple
|
97 |
+
|
98 |
+
tokenizer = tiktoken.get_encoding("gpt2")
|
99 |
+
|
100 |
+
prompt = "Ever effort moves"
|
101 |
+
enc_prompt = tokenizer.encode(prompt)
|
102 |
+
enc_prompt = torch.tensor([enc_prompt])
|
103 |
+
|
104 |
+
token_ids = generate_text_simple(
|
105 |
+
model=model,
|
106 |
+
idx=enc_prompt,
|
107 |
+
max_new_tokens=25,
|
108 |
+
context_size=NEW_CONFIG["context_length"]
|
109 |
+
)
|
110 |
+
|
111 |
+
output = tokenizer.decode(token_ids.squeeze().tolist())
|
112 |
+
print(output)
|
113 |
+
```
|
114 |
+
|
115 |
+
```
|
116 |
+
Ever effort moves the needle.
|
117 |
+
|
118 |
+
The first step is to understand the difference between a "good" and a "bad" goal.
|
119 |
+
```
|
120 |
+
|
gpt2-large-774M.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:50d4f6a4c6cdf22734f0ef4781e26fd532cb920729bd51c28b8d8b184be95911
|
3 |
+
size 3504649736
|
gpt2-large-774M.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:418e11e11cd3154dc33c6a9127012c5650772f788cc95d96ff8ba9d4c3d93838
|
3 |
+
size 3504494824
|
gpt2-medium-355M.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:9b6b658dc97b2f7a029990036638c7219619c6933eada7fb506c554561f73c60
|
3 |
+
size 1725923861
|
gpt2-medium-355M.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0151c5eb4ecf849992e9a105ae6fbd0085aaae49e4aca0b92ce2711296394e80
|
3 |
+
size 1725850968
|
gpt2-small-124M.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:24a9078c5b27137fb2706d2206349c759952a7de8f72ef8e305dc02511bcabf8
|
3 |
+
size 702538513
|
gpt2-small-124M.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:8bb78718496139b351333db1ff791f1be263b46b3d2f3e91cdc8c6bca835cccb
|
3 |
+
size 702501224
|
gpt2-xl-1558M.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0e634dd2e3919722d1638fcc8da6f296e6a5a5e4c80cbb869c2ab83867f91853
|
3 |
+
size 6753668102
|
gpt2-xl-1558M.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:47b9d7f40b16200d79d11ce1567c1ac3b470f8c2f53bda47fa5c36c61667b74f
|
3 |
+
size 6753501224
|