nanoGPT - Character-Level Shakespeare - Tied Weights

Small character-level, GPT-style language model trained on the works of Shakespeare using Andrej Karpathy's nanoGPT repo from my project LLMs Universally Learn a Feature Representing Token Frequency / Rarity.

Versions

This model has two versions:

  1. With tied embedding / unembedding weights (in true GPT fashion) - THIS PAGE
  2. Without tied embedding / unembedding weights

Usage

The model can be loaded using AutoModel from Hugging Face's transformers package:

>>> from transformers import AutoModel
>>> model = AutoModel.from_pretrained("sosier/nanoGPT-shakespeare-char-tied-weights", trust_remote_code=True)
>>> model
number of parameters: 10.65M

NanoGPT(
  (transformer): ModuleDict(
    (wte): Embedding(65, 384)
    (wpe): Embedding(256, 384)
    (drop): Dropout(p=0.2, inplace=False)
    (h): ModuleList(
      (0-5): 6 x Block(
        (ln_1): LayerNorm()
        (attn): CausalSelfAttention(
          (c_attn): Linear(in_features=384, out_features=1152, bias=False)
          (c_proj): Linear(in_features=384, out_features=384, bias=False)
          (attn_dropout): Dropout(p=0.2, inplace=False)
          (resid_dropout): Dropout(p=0.2, inplace=False)
        )
        (ln_2): LayerNorm()
        (mlp): MLP(
          (c_fc): Linear(in_features=384, out_features=1536, bias=False)
          (gelu): GELU(approximate='none')
          (c_proj): Linear(in_features=1536, out_features=384, bias=False)
          (dropout): Dropout(p=0.2, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm()
  )
  (lm_head): Linear(in_features=384, out_features=65, bias=False)
)

Training Data / Token Counts

The training data token counts can be found on my GitHub repo here and can be loaded using the instructions here.

Tokenizer

As a character-level model the tokenizer is simply a mapping for each character to its token id as given in the token counts (see section above).

Downloads last month
253
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.