sosier commited on
Commit
688a981
·
verified ·
1 Parent(s): bcfd3d7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -6
README.md CHANGED
@@ -1,9 +1,60 @@
1
  ---
2
- tags:
3
- - pytorch_model_hub_mixin
4
- - model_hub_mixin
5
  ---
6
 
7
- This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
8
- - Library: [More Information Needed]
9
- - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: en
 
 
3
  ---
4
 
5
+ # nanoGPT - Character-Level Shakespeare - Weights NOT Tied
6
+
7
+ Small character-level, GPT-style language model trained on the works of Shakespeare using Andrej Karpathy's [nanoGPT repo](https://github.com/karpathy/nanoGPT/tree/master) from my project [LLMs Universally Learn a Feature Representing Token Frequency / Rarity](https://github.com/sosier/LLM_Token_Frequency_Feature).
8
+
9
+ ## Versions
10
+
11
+ This model has two versions:
12
+ 1. [With tied embedding / unembedding weights](https://huggingface.co/sosier/nanoGPT-shakespeare-char-tied-weights) (in true GPT fashion)
13
+ 2. [Without tied embedding / unembedding weights](https://huggingface.co/sosier/nanoGPT-shakespeare-char-weights-not-tied) - **THIS PAGE**
14
+
15
+ ## Usage
16
+
17
+ The model can be loaded using `AutoModel` from Hugging Face's `transformers` package:
18
+
19
+ ```python
20
+ >>> from transformers import AutoModel
21
+ >>> model = AutoModel.from_pretrained("sosier/nanoGPT-shakespeare-char-weights-not-tied", trust_remote_code=True)
22
+ >>> model
23
+ number of parameters: 10.67M
24
+
25
+ NanoGPT(
26
+ (transformer): ModuleDict(
27
+ (wte): Embedding(65, 384)
28
+ (wpe): Embedding(256, 384)
29
+ (drop): Dropout(p=0.2, inplace=False)
30
+ (h): ModuleList(
31
+ (0-5): 6 x Block(
32
+ (ln_1): LayerNorm()
33
+ (attn): CausalSelfAttention(
34
+ (c_attn): Linear(in_features=384, out_features=1152, bias=False)
35
+ (c_proj): Linear(in_features=384, out_features=384, bias=False)
36
+ (attn_dropout): Dropout(p=0.2, inplace=False)
37
+ (resid_dropout): Dropout(p=0.2, inplace=False)
38
+ )
39
+ (ln_2): LayerNorm()
40
+ (mlp): MLP(
41
+ (c_fc): Linear(in_features=384, out_features=1536, bias=False)
42
+ (gelu): GELU(approximate='none')
43
+ (c_proj): Linear(in_features=1536, out_features=384, bias=False)
44
+ (dropout): Dropout(p=0.2, inplace=False)
45
+ )
46
+ )
47
+ )
48
+ (ln_f): LayerNorm()
49
+ )
50
+ (lm_head): Linear(in_features=384, out_features=65, bias=False)
51
+ )
52
+ ```
53
+
54
+ ## Training Data / Token Counts
55
+
56
+ The training data token counts can be found on my GitHub repo [here](https://github.com/sosier/LLM_Token_Frequency_Feature/blob/main/token_counts/shakespeare_char_train_token_counts.json) and can be loaded using the instructions [here](https://github.com/sosier/LLM_Token_Frequency_Feature/tree/main/token_counts).
57
+
58
+ ## Tokenizer
59
+
60
+ As a character-level model the tokenizer is simply a mapping for each character to its token id as given in the token counts (see section above).