rasbt commited on
Commit
9277059
·
verified ·
1 Parent(s): 10e65c7

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # GPT-2 PyTorch
2
+
3
+
4
+
5
+ The original GPT-2 model weights from [https://openaipublic.blob.core.windows.net/gpt-2/models](https://openaipublic.blob.core.windows.net/gpt-2/models) converted from TensorFlow to PyTorch state dicts and PyTorch safetensors files.
6
+
7
+
8
+  
9
+ ## Usage
10
+
11
+ The section below explain how the model weights can be used.
12
+
13
+  
14
+ ### Setup
15
+
16
+ Install the [llms-from-scratch](https://pypi.org/project/llms-from-scratch/) PyPI package:
17
+
18
+ ```python
19
+ pip install llms_from_scratch
20
+ ```
21
+
22
+ Or copy the and paste the `GPTModel` class and dependencies from [GitHub](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch05/01_main-chapter-code/previous_chapters.py).
23
+
24
+
25
+  
26
+ ### Loading the model weights
27
+
28
+ The following shows how to load the weights into the 355M parameter model:
29
+
30
+ ```python
31
+ import torch
32
+ from llms_from_scratch.ch04 import GPTModel
33
+
34
+ GPT_CONFIG_BASE = {
35
+ "vocab_size": 50257, # Vocabulary size
36
+ "context_length": 1024, # Original context length
37
+ "emb_dim": 768, # Embedding dimension
38
+ "n_heads": 12, # Number of attention heads
39
+ "n_layers": 12, # Number of layers
40
+ "drop_rate": 0.0, # Dropout rate
41
+ "qkv_bias": True # Query-key-value bias
42
+ }
43
+
44
+ model_configs = {
45
+ "gpt2-small (124M)": {"emb_dim": 768, "n_layers": 12, "n_heads": 12},
46
+ "gpt2-medium (355M)": {"emb_dim": 1024, "n_layers": 24, "n_heads": 16},
47
+ "gpt2-large (774M)": {"emb_dim": 1280, "n_layers": 36, "n_heads": 20},
48
+ "gpt2-xl (1558M)": {"emb_dim": 1600, "n_layers": 48, "n_heads": 25},
49
+ }
50
+
51
+ model_name = "gpt2-medium (355M)" # Example model name
52
+ NEW_CONFIG = GPT_CONFIG_BASE.copy()
53
+ NEW_CONFIG.update(model_configs[model_name])
54
+
55
+ model = GPTModel(NEW_CONFIG)
56
+
57
+ # Option A: state dict
58
+ model.load_state_dict(torch.load("gpt2-medium-355M.pth", weights_only=True));
59
+ model.eval();
60
+
61
+ # Option B: safetensors
62
+ # from safetensors.torch import load_file
63
+ # model.load_state_dict(load_file("gpt2-medium-355M.safetensors"))
64
+
65
+ model.eval();
66
+ ```
67
+
68
+ To use the other models, simply replace the model names:
69
+
70
+ ```
71
+ model_name = "gpt2-medium (355M)"
72
+ ...
73
+ model.load_state_dict(torch.load("gpt2-medium-355M.pth"))
74
+ # or
75
+ model.load_state_dict(load_file("gpt2-medium-355M.safetensors"))
76
+ ```
77
+
78
+ with the desired model names. For example:
79
+
80
+ ```
81
+ model_name = "gpt2-small (124M)"
82
+ ...
83
+ model.load_state_dict(torch.load("gpt2-small-124M.pth"))
84
+ # or
85
+ model.load_state_dict(load_file("gpt2-small-124M.safetensors"))
86
+ ```
87
+
88
+
89
+  
90
+ ### Generating text
91
+
92
+ The following showcases how the model can then be used to generate text.
93
+
94
+ ```python
95
+ import tiktoken
96
+ from llms_from_scratch.ch04 import generate_text_simple
97
+
98
+ tokenizer = tiktoken.get_encoding("gpt2")
99
+
100
+ prompt = "Ever effort moves"
101
+ enc_prompt = tokenizer.encode(prompt)
102
+ enc_prompt = torch.tensor([enc_prompt])
103
+
104
+ token_ids = generate_text_simple(
105
+ model=model,
106
+ idx=enc_prompt,
107
+ max_new_tokens=25,
108
+ context_size=NEW_CONFIG["context_length"]
109
+ )
110
+
111
+ output = tokenizer.decode(token_ids.squeeze().tolist())
112
+ print(output)
113
+ ```
114
+
115
+ ```
116
+ Ever effort moves the needle.
117
+
118
+ The first step is to understand the difference between a "good" and a "bad" goal.
119
+ ```
120
+
gpt2-large-774M.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:50d4f6a4c6cdf22734f0ef4781e26fd532cb920729bd51c28b8d8b184be95911
3
+ size 3504649736
gpt2-large-774M.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:418e11e11cd3154dc33c6a9127012c5650772f788cc95d96ff8ba9d4c3d93838
3
+ size 3504494824
gpt2-medium-355M.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9b6b658dc97b2f7a029990036638c7219619c6933eada7fb506c554561f73c60
3
+ size 1725923861
gpt2-medium-355M.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0151c5eb4ecf849992e9a105ae6fbd0085aaae49e4aca0b92ce2711296394e80
3
+ size 1725850968
gpt2-small-124M.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:24a9078c5b27137fb2706d2206349c759952a7de8f72ef8e305dc02511bcabf8
3
+ size 702538513
gpt2-small-124M.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8bb78718496139b351333db1ff791f1be263b46b3d2f3e91cdc8c6bca835cccb
3
+ size 702501224
gpt2-xl-1558M.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0e634dd2e3919722d1638fcc8da6f296e6a5a5e4c80cbb869c2ab83867f91853
3
+ size 6753668102
gpt2-xl-1558M.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:47b9d7f40b16200d79d11ce1567c1ac3b470f8c2f53bda47fa5c36c61667b74f
3
+ size 6753501224