File size: 2,469 Bytes
5bdba26
688a981
5bdba26
 
688a981
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
language: en
---

# nanoGPT - Character-Level Shakespeare - Weights NOT Tied

Small character-level, GPT-style language model trained on the works of Shakespeare using Andrej Karpathy's [nanoGPT repo](https://github.com/karpathy/nanoGPT/tree/master) from my project [LLMs Universally Learn a Feature Representing Token Frequency / Rarity](https://github.com/sosier/LLM_Token_Frequency_Feature).

## Versions

This model has two versions:
 1. [With tied embedding / unembedding weights](https://huggingface.co/sosier/nanoGPT-shakespeare-char-tied-weights) (in true GPT fashion)
 2. [Without tied embedding / unembedding weights](https://huggingface.co/sosier/nanoGPT-shakespeare-char-weights-not-tied) - **THIS PAGE**

## Usage

The model can be loaded using `AutoModel` from Hugging Face's `transformers` package:

```python
>>> from transformers import AutoModel
>>> model = AutoModel.from_pretrained("sosier/nanoGPT-shakespeare-char-weights-not-tied", trust_remote_code=True)
>>> model
number of parameters: 10.67M

NanoGPT(
  (transformer): ModuleDict(
    (wte): Embedding(65, 384)
    (wpe): Embedding(256, 384)
    (drop): Dropout(p=0.2, inplace=False)
    (h): ModuleList(
      (0-5): 6 x Block(
        (ln_1): LayerNorm()
        (attn): CausalSelfAttention(
          (c_attn): Linear(in_features=384, out_features=1152, bias=False)
          (c_proj): Linear(in_features=384, out_features=384, bias=False)
          (attn_dropout): Dropout(p=0.2, inplace=False)
          (resid_dropout): Dropout(p=0.2, inplace=False)
        )
        (ln_2): LayerNorm()
        (mlp): MLP(
          (c_fc): Linear(in_features=384, out_features=1536, bias=False)
          (gelu): GELU(approximate='none')
          (c_proj): Linear(in_features=1536, out_features=384, bias=False)
          (dropout): Dropout(p=0.2, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm()
  )
  (lm_head): Linear(in_features=384, out_features=65, bias=False)
)
```

## Training Data / Token Counts

The training data token counts can be found on my GitHub repo [here](https://github.com/sosier/LLM_Token_Frequency_Feature/blob/main/token_counts/shakespeare_char_train_token_counts.json) and can be loaded using the instructions [here](https://github.com/sosier/LLM_Token_Frequency_Feature/tree/main/token_counts).

## Tokenizer

As a character-level model the tokenizer is simply a mapping for each character to its token id as given in the token counts (see section above).