Upload folder using huggingface_hub

Browse files

Files changed (9) hide show

README.md +120 -0
gpt2-large-774M.pth +3 -0
gpt2-large-774M.safetensors +3 -0
gpt2-medium-355M.pth +3 -0
gpt2-medium-355M.safetensors +3 -0
gpt2-small-124M.pth +3 -0
gpt2-small-124M.safetensors +3 -0
gpt2-xl-1558M.pth +3 -0
gpt2-xl-1558M.safetensors +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,120 @@

+# GPT-2 PyTorch
+The original GPT-2 model weights from [https://openaipublic.blob.core.windows.net/gpt-2/models](https://openaipublic.blob.core.windows.net/gpt-2/models) converted from TensorFlow to PyTorch state dicts and PyTorch safetensors files.
+&nbsp;
+## Usage
+The section below explain how the model weights can be used.
+&nbsp;
+### Setup
+Install the [llms-from-scratch](https://pypi.org/project/llms-from-scratch/) PyPI package:
+```python
+pip install llms_from_scratch
+```
+Or copy the and paste the `GPTModel` class and dependencies from [GitHub](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch05/01_main-chapter-code/previous_chapters.py).
+&nbsp;
+### Loading the model weights
+The following shows how to load the weights into the 355M parameter model:
+```python
+import torch
+from llms_from_scratch.ch04 import GPTModel
+GPT_CONFIG_BASE = {
+    "vocab_size": 50257,    # Vocabulary size
+    "context_length": 1024, # Original context length
+    "emb_dim": 768,         # Embedding dimension
+    "n_heads": 12,          # Number of attention heads
+    "n_layers": 12,         # Number of layers
+    "drop_rate": 0.0,       # Dropout rate
+    "qkv_bias": True        # Query-key-value bias
+}
+model_configs = {
+    "gpt2-small (124M)": {"emb_dim": 768, "n_layers": 12, "n_heads": 12},
+    "gpt2-medium (355M)": {"emb_dim": 1024, "n_layers": 24, "n_heads": 16},
+    "gpt2-large (774M)": {"emb_dim": 1280, "n_layers": 36, "n_heads": 20},
+    "gpt2-xl (1558M)": {"emb_dim": 1600, "n_layers": 48, "n_heads": 25},
+}
+model_name = "gpt2-medium (355M)"  # Example model name
+NEW_CONFIG = GPT_CONFIG_BASE.copy()
+NEW_CONFIG.update(model_configs[model_name])
+model = GPTModel(NEW_CONFIG)
+# Option A: state dict
+model.load_state_dict(torch.load("gpt2-medium-355M.pth", weights_only=True));
+model.eval();
+# Option B: safetensors
+# from safetensors.torch import load_file
+# model.load_state_dict(load_file("gpt2-medium-355M.safetensors"))
+model.eval();
+```
+To use the other models, simply replace the model names:
+```
+model_name = "gpt2-medium (355M)"
+...
+model.load_state_dict(torch.load("gpt2-medium-355M.pth"))
+# or
+model.load_state_dict(load_file("gpt2-medium-355M.safetensors"))
+```
+with the desired model names. For example:
+```
+model_name = "gpt2-small (124M)"
+...
+model.load_state_dict(torch.load("gpt2-small-124M.pth"))
+# or
+model.load_state_dict(load_file("gpt2-small-124M.safetensors"))
+```
+&nbsp;
+### Generating text
+The following showcases how the model can then be used to generate text.
+```python
+import tiktoken
+from llms_from_scratch.ch04 import generate_text_simple
+tokenizer = tiktoken.get_encoding("gpt2")
+prompt = "Ever effort moves"
+enc_prompt = tokenizer.encode(prompt)
+enc_prompt = torch.tensor([enc_prompt])
+token_ids = generate_text_simple(
+    model=model,
+    idx=enc_prompt,
+    max_new_tokens=25,
+    context_size=NEW_CONFIG["context_length"]
+)
+output = tokenizer.decode(token_ids.squeeze().tolist())
+print(output)
+```
+```
+Ever effort moves the needle.
+The first step is to understand the difference between a "good" and a "bad" goal.
+```

gpt2-large-774M.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:50d4f6a4c6cdf22734f0ef4781e26fd532cb920729bd51c28b8d8b184be95911
+size 3504649736

gpt2-large-774M.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:418e11e11cd3154dc33c6a9127012c5650772f788cc95d96ff8ba9d4c3d93838
+size 3504494824

gpt2-medium-355M.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9b6b658dc97b2f7a029990036638c7219619c6933eada7fb506c554561f73c60
+size 1725923861

gpt2-medium-355M.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0151c5eb4ecf849992e9a105ae6fbd0085aaae49e4aca0b92ce2711296394e80
+size 1725850968

gpt2-small-124M.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:24a9078c5b27137fb2706d2206349c759952a7de8f72ef8e305dc02511bcabf8
+size 702538513

gpt2-small-124M.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8bb78718496139b351333db1ff791f1be263b46b3d2f3e91cdc8c6bca835cccb
+size 702501224

gpt2-xl-1558M.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0e634dd2e3919722d1638fcc8da6f296e6a5a5e4c80cbb869c2ab83867f91853
+size 6753668102

gpt2-xl-1558M.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:47b9d7f40b16200d79d11ce1567c1ac3b470f8c2f53bda47fa5c36c61667b74f
+size 6753501224