nanoGPT - Character-Level Shakespeare - Weights NOT Tied
Small character-level, GPT-style language model trained on the works of Shakespeare using Andrej Karpathy's nanoGPT repo from my project LLMs Universally Learn a Feature Representing Token Frequency / Rarity.
Versions
This model has two versions:
- With tied embedding / unembedding weights (in true GPT fashion)
- Without tied embedding / unembedding weights - THIS PAGE
Usage
The model can be loaded using AutoModel
from Hugging Face's transformers
package:
>>> from transformers import AutoModel
>>> model = AutoModel.from_pretrained("sosier/nanoGPT-shakespeare-char-weights-not-tied", trust_remote_code=True)
>>> model
number of parameters: 10.67M
NanoGPT(
(transformer): ModuleDict(
(wte): Embedding(65, 384)
(wpe): Embedding(256, 384)
(drop): Dropout(p=0.2, inplace=False)
(h): ModuleList(
(0-5): 6 x Block(
(ln_1): LayerNorm()
(attn): CausalSelfAttention(
(c_attn): Linear(in_features=384, out_features=1152, bias=False)
(c_proj): Linear(in_features=384, out_features=384, bias=False)
(attn_dropout): Dropout(p=0.2, inplace=False)
(resid_dropout): Dropout(p=0.2, inplace=False)
)
(ln_2): LayerNorm()
(mlp): MLP(
(c_fc): Linear(in_features=384, out_features=1536, bias=False)
(gelu): GELU(approximate='none')
(c_proj): Linear(in_features=1536, out_features=384, bias=False)
(dropout): Dropout(p=0.2, inplace=False)
)
)
)
(ln_f): LayerNorm()
)
(lm_head): Linear(in_features=384, out_features=65, bias=False)
)
Training Data / Token Counts
The training data token counts can be found on my GitHub repo here and can be loaded using the instructions here.
Tokenizer
As a character-level model the tokenizer is simply a mapping for each character to its token id as given in the token counts (see section above).
- Downloads last month
- 4
Inference API (serverless) does not yet support model repos that contain custom code.