|
--- |
|
language: en |
|
--- |
|
|
|
# nanoGPT - Character-Level Shakespeare - Weights NOT Tied |
|
|
|
Small character-level, GPT-style language model trained on the works of Shakespeare using Andrej Karpathy's [nanoGPT repo](https://github.com/karpathy/nanoGPT/tree/master) from my project [LLMs Universally Learn a Feature Representing Token Frequency / Rarity](https://github.com/sosier/LLM_Token_Frequency_Feature). |
|
|
|
## Versions |
|
|
|
This model has two versions: |
|
1. [With tied embedding / unembedding weights](https://huggingface.co/sosier/nanoGPT-shakespeare-char-tied-weights) (in true GPT fashion) |
|
2. [Without tied embedding / unembedding weights](https://huggingface.co/sosier/nanoGPT-shakespeare-char-weights-not-tied) - **THIS PAGE** |
|
|
|
## Usage |
|
|
|
The model can be loaded using `AutoModel` from Hugging Face's `transformers` package: |
|
|
|
```python |
|
>>> from transformers import AutoModel |
|
>>> model = AutoModel.from_pretrained("sosier/nanoGPT-shakespeare-char-weights-not-tied", trust_remote_code=True) |
|
>>> model |
|
number of parameters: 10.67M |
|
|
|
NanoGPT( |
|
(transformer): ModuleDict( |
|
(wte): Embedding(65, 384) |
|
(wpe): Embedding(256, 384) |
|
(drop): Dropout(p=0.2, inplace=False) |
|
(h): ModuleList( |
|
(0-5): 6 x Block( |
|
(ln_1): LayerNorm() |
|
(attn): CausalSelfAttention( |
|
(c_attn): Linear(in_features=384, out_features=1152, bias=False) |
|
(c_proj): Linear(in_features=384, out_features=384, bias=False) |
|
(attn_dropout): Dropout(p=0.2, inplace=False) |
|
(resid_dropout): Dropout(p=0.2, inplace=False) |
|
) |
|
(ln_2): LayerNorm() |
|
(mlp): MLP( |
|
(c_fc): Linear(in_features=384, out_features=1536, bias=False) |
|
(gelu): GELU(approximate='none') |
|
(c_proj): Linear(in_features=1536, out_features=384, bias=False) |
|
(dropout): Dropout(p=0.2, inplace=False) |
|
) |
|
) |
|
) |
|
(ln_f): LayerNorm() |
|
) |
|
(lm_head): Linear(in_features=384, out_features=65, bias=False) |
|
) |
|
``` |
|
|
|
## Training Data / Token Counts |
|
|
|
The training data token counts can be found on my GitHub repo [here](https://github.com/sosier/LLM_Token_Frequency_Feature/blob/main/token_counts/shakespeare_char_train_token_counts.json) and can be loaded using the instructions [here](https://github.com/sosier/LLM_Token_Frequency_Feature/tree/main/token_counts). |
|
|
|
## Tokenizer |
|
|
|
As a character-level model the tokenizer is simply a mapping for each character to its token id as given in the token counts (see section above). |
|
|