sosier
/

nanoGPT-shakespeare-char-weights-not-tied

Feature Extraction

Model card Files Files and versions Community

nanoGPT-shakespeare-char-weights-not-tied / README.md

sosier's picture

Update README.md

688a981 verified 6 months ago

|

2.47 kB

	---
	language: en
	---

	# nanoGPT - Character-Level Shakespeare - Weights NOT Tied

	Small character-level, GPT-style language model trained on the works of Shakespeare using Andrej Karpathy's [nanoGPT repo](https://github.com/karpathy/nanoGPT/tree/master) from my project [LLMs Universally Learn a Feature Representing Token Frequency / Rarity](https://github.com/sosier/LLM_Token_Frequency_Feature).

	## Versions

	This model has two versions:
	1. [With tied embedding / unembedding weights](https://huggingface.co/sosier/nanoGPT-shakespeare-char-tied-weights) (in true GPT fashion)
	2. [Without tied embedding / unembedding weights](https://huggingface.co/sosier/nanoGPT-shakespeare-char-weights-not-tied) - THIS PAGE

	## Usage

	The model can be loaded using `AutoModel` from Hugging Face's `transformers` package:

	```python
	>>> from transformers import AutoModel
	>>> model = AutoModel.from_pretrained("sosier/nanoGPT-shakespeare-char-weights-not-tied", trust_remote_code=True)
	>>> model
	number of parameters: 10.67M

	NanoGPT(
	(transformer): ModuleDict(
	(wte): Embedding(65, 384)
	(wpe): Embedding(256, 384)
	(drop): Dropout(p=0.2, inplace=False)
	(h): ModuleList(
	(0-5): 6 x Block(
	(ln_1): LayerNorm()
	(attn): CausalSelfAttention(
	(c_attn): Linear(in_features=384, out_features=1152, bias=False)
	(c_proj): Linear(in_features=384, out_features=384, bias=False)
	(attn_dropout): Dropout(p=0.2, inplace=False)
	(resid_dropout): Dropout(p=0.2, inplace=False)
	)
	(ln_2): LayerNorm()
	(mlp): MLP(
	(c_fc): Linear(in_features=384, out_features=1536, bias=False)
	(gelu): GELU(approximate='none')
	(c_proj): Linear(in_features=1536, out_features=384, bias=False)
	(dropout): Dropout(p=0.2, inplace=False)
	)
	)
	)
	(ln_f): LayerNorm()
	)
	(lm_head): Linear(in_features=384, out_features=65, bias=False)
	)
	```

	## Training Data / Token Counts

	The training data token counts can be found on my GitHub repo [here](https://github.com/sosier/LLM_Token_Frequency_Feature/blob/main/token_counts/shakespeare_char_train_token_counts.json) and can be loaded using the instructions [here](https://github.com/sosier/LLM_Token_Frequency_Feature/tree/main/token_counts).

	## Tokenizer

	As a character-level model the tokenizer is simply a mapping for each character to its token id as given in the token counts (see section above).