nanoGPT - Character-Level Shakespeare - Tied Weights

Small character-level, GPT-style language model trained on the works of Shakespeare using Andrej Karpathy's nanoGPT repo from my project LLMs Universally Learn a Feature Representing Token Frequency / Rarity.

Versions

This model has two versions:

  1. With tied embedding / unembedding weights (in true GPT fashion) - THIS PAGE
  2. Without tied embedding / unembedding weights

Usage

The model can be loaded using AutoModel from Hugging Face's transformers package:

>>> from transformers import AutoModel
>>> model = AutoModel.from_pretrained("sosier/nanoGPT-shakespeare-char-tied-weights", trust_remote_code=True)
>>> model
number of parameters: 10.65M

NanoGPT(
  (transformer): ModuleDict(
    (wte): Embedding(65, 384)
    (wpe): Embedding(256, 384)
    (drop): Dropout(p=0.2, inplace=False)
    (h): ModuleList(
      (0-5): 6 x Block(
        (ln_1): LayerNorm()
        (attn): CausalSelfAttention(
          (c_attn): Linear(in_features=384, out_features=1152, bias=False)
          (c_proj): Linear(in_features=384, out_features=384, bias=False)
          (attn_dropout): Dropout(p=0.2, inplace=False)
          (resid_dropout): Dropout(p=0.2, inplace=False)
        )
        (ln_2): LayerNorm()
        (mlp): MLP(
          (c_fc): Linear(in_features=384, out_features=1536, bias=False)
          (gelu): GELU(approximate='none')
          (c_proj): Linear(in_features=1536, out_features=384, bias=False)
          (dropout): Dropout(p=0.2, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm()
  )
  (lm_head): Linear(in_features=384, out_features=65, bias=False)
)

Training Data / Token Counts

The training data token counts can be found on my GitHub repo here and can be loaded using the instructions here.

Tokenizer

As a character-level model the tokenizer is simply a mapping for each character to its token id as given in the token counts (see section above).

Downloads last month
196
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support model that require custom code execution.