--- license: mit language: - en tags: - pytorch - gpt - gpt-2 --- # GPT-2 PyTorch The original GPT-2 model weights from [https://openaipublic.blob.core.windows.net/gpt-2/models](https://openaipublic.blob.core.windows.net/gpt-2/models) converted from TensorFlow to PyTorch state dicts and PyTorch safetensors files.   ## Usage The section below explain how the model weights can be used.   ### Setup Install the [llms-from-scratch](https://pypi.org/project/llms-from-scratch/) PyPI package: ```python pip install llms_from_scratch ``` Or copy the and paste the `GPTModel` class and dependencies from [GitHub](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch05/01_main-chapter-code/previous_chapters.py).   ### Loading the model weights The following shows how to load the weights into the 355M parameter model: ```python import torch from llms_from_scratch.ch04 import GPTModel GPT_CONFIG_BASE = { "vocab_size": 50257, # Vocabulary size "context_length": 1024, # Original context length "emb_dim": 768, # Embedding dimension "n_heads": 12, # Number of attention heads "n_layers": 12, # Number of layers "drop_rate": 0.0, # Dropout rate "qkv_bias": True # Query-key-value bias } model_configs = { "gpt2-small (124M)": {"emb_dim": 768, "n_layers": 12, "n_heads": 12}, "gpt2-medium (355M)": {"emb_dim": 1024, "n_layers": 24, "n_heads": 16}, "gpt2-large (774M)": {"emb_dim": 1280, "n_layers": 36, "n_heads": 20}, "gpt2-xl (1558M)": {"emb_dim": 1600, "n_layers": 48, "n_heads": 25}, } model_name = "gpt2-medium (355M)" # Example model name NEW_CONFIG = GPT_CONFIG_BASE.copy() NEW_CONFIG.update(model_configs[model_name]) model = GPTModel(NEW_CONFIG) # Option A: state dict model.load_state_dict(torch.load("gpt2-medium-355M.pth", weights_only=True)); model.eval(); # Option B: safetensors # from safetensors.torch import load_file # model.load_state_dict(load_file("gpt2-medium-355M.safetensors")) model.eval(); ``` To use the other models, simply replace the model names: ``` model_name = "gpt2-medium (355M)" ... model.load_state_dict(torch.load("gpt2-medium-355M.pth")) # or model.load_state_dict(load_file("gpt2-medium-355M.safetensors")) ``` with the desired model names. For example: ``` model_name = "gpt2-small (124M)" ... model.load_state_dict(torch.load("gpt2-small-124M.pth")) # or model.load_state_dict(load_file("gpt2-small-124M.safetensors")) ```   ### Generating text The following showcases how the model can then be used to generate text. ```python import tiktoken from llms_from_scratch.ch04 import generate_text_simple tokenizer = tiktoken.get_encoding("gpt2") prompt = "Ever effort moves" enc_prompt = tokenizer.encode(prompt) enc_prompt = torch.tensor([enc_prompt]) token_ids = generate_text_simple( model=model, idx=enc_prompt, max_new_tokens=25, context_size=NEW_CONFIG["context_length"] ) output = tokenizer.decode(token_ids.squeeze().tolist()) print(output) ``` ``` Ever effort moves the needle. The first step is to understand the difference between a "good" and a "bad" goal. ```