sosier
/

nanoGPT-shakespeare-char-weights-not-tied

Feature Extraction

Model card Files Files and versions Community

sosier commited on Jul 27, 2024

Commit

688a981

·

verified ·

1 Parent(s): bcfd3d7

Update README.md

Files changed (1) hide show

README.md +57 -6

README.md CHANGED Viewed

@@ -1,9 +1,60 @@
 ---
-tags:
-- pytorch_model_hub_mixin
-- model_hub_mixin
 ---
-This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
-- Library: [More Information Needed]
-- Docs: [More Information Needed]

 ---
+language: en
 ---
+# nanoGPT - Character-Level Shakespeare - Weights NOT Tied
+Small character-level, GPT-style language model trained on the works of Shakespeare using Andrej Karpathy's [nanoGPT repo](https://github.com/karpathy/nanoGPT/tree/master) from my project [LLMs Universally Learn a Feature Representing Token Frequency / Rarity](https://github.com/sosier/LLM_Token_Frequency_Feature).
+## Versions
+This model has two versions:
+ 1. [With tied embedding / unembedding weights](https://huggingface.co/sosier/nanoGPT-shakespeare-char-tied-weights) (in true GPT fashion)
+ 2. [Without tied embedding / unembedding weights](https://huggingface.co/sosier/nanoGPT-shakespeare-char-weights-not-tied) - **THIS PAGE**
+## Usage
+The model can be loaded using `AutoModel` from Hugging Face's `transformers` package:
+```python
+>>> from transformers import AutoModel
+>>> model = AutoModel.from_pretrained("sosier/nanoGPT-shakespeare-char-weights-not-tied", trust_remote_code=True)
+>>> model
+number of parameters: 10.67M
+NanoGPT(
+  (transformer): ModuleDict(
+    (wte): Embedding(65, 384)
+    (wpe): Embedding(256, 384)
+    (drop): Dropout(p=0.2, inplace=False)
+    (h): ModuleList(
+      (0-5): 6 x Block(
+        (ln_1): LayerNorm()
+        (attn): CausalSelfAttention(
+          (c_attn): Linear(in_features=384, out_features=1152, bias=False)
+          (c_proj): Linear(in_features=384, out_features=384, bias=False)
+          (attn_dropout): Dropout(p=0.2, inplace=False)
+          (resid_dropout): Dropout(p=0.2, inplace=False)
+        )
+        (ln_2): LayerNorm()
+        (mlp): MLP(
+          (c_fc): Linear(in_features=384, out_features=1536, bias=False)
+          (gelu): GELU(approximate='none')
+          (c_proj): Linear(in_features=1536, out_features=384, bias=False)
+          (dropout): Dropout(p=0.2, inplace=False)
+        )
+      )
+    )
+    (ln_f): LayerNorm()
+  )
+  (lm_head): Linear(in_features=384, out_features=65, bias=False)
+)
+```
+## Training Data / Token Counts
+The training data token counts can be found on my GitHub repo [here](https://github.com/sosier/LLM_Token_Frequency_Feature/blob/main/token_counts/shakespeare_char_train_token_counts.json) and can be loaded using the instructions [here](https://github.com/sosier/LLM_Token_Frequency_Feature/tree/main/token_counts).
+## Tokenizer
+As a character-level model the tokenizer is simply a mapping for each character to its token id as given in the token counts (see section above).