apollo-research
/

gpt2_noLN

Text Generation

text-generation-inference

Model card Files Files and versions Community

stefanhex-apollo commited on Nov 18, 2024

Commit

64a829f

·

verified ·

1 Parent(s): 5d176c5

Update README.md

Files changed (1) hide show

README.md +1 -2

README.md CHANGED Viewed

@@ -8,9 +8,8 @@ tags: []
 This is a gpt2-small model with LayerNorm fine-tuned out.
 The model was fine-tuned on OpenWebText for ~500M tokens (1000 iterations of batch size ~488 at 1024 context length) while gradually disableing LayerNorm layers.
-For details see [here](https://www.lesswrong.com/posts/THzcKKQd4oWkg4dSP/you-can-remove-gpt2-s-layernorm-by-fine-tuning-for-an-hour) and the upcoming paper.
-There are 5 similar models available (v1 through v5) trained with different fine-tuning schedules. Please refer to the [paper](https://arxiv.org/abs/2409.13710)
 for details; the training code is available [here](https://github.com/ApolloResearch/gpt2_noLN). The best model (v4) is the default as of 6th September 2024 (previously v2 was the default).
 The model is a `GPT2LMHeadModel` (to avoid requiring `trust_remote_code`) which technically contains LayerNorm blocks.

 This is a gpt2-small model with LayerNorm fine-tuned out.
 The model was fine-tuned on OpenWebText for ~500M tokens (1000 iterations of batch size ~488 at 1024 context length) while gradually disableing LayerNorm layers.
+There are 5 similar models available (v1 through v5) trained with different fine-tuning schedules. Please refer to the [paper](https://arxiv.org/abs/2409.13710) or [blog post](https://www.lesswrong.com/posts/THzcKKQd4oWkg4dSP/you-can-remove-gpt2-s-layernorm-by-fine-tuning-for-an-hour)
 for details; the training code is available [here](https://github.com/ApolloResearch/gpt2_noLN). The best model (v4) is the default as of 6th September 2024 (previously v2 was the default).
 The model is a `GPT2LMHeadModel` (to avoid requiring `trust_remote_code`) which technically contains LayerNorm blocks.