stefanhex-apollo commited on
Commit
64a829f
1 Parent(s): 5d176c5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -2
README.md CHANGED
@@ -8,9 +8,8 @@ tags: []
8
  This is a gpt2-small model with LayerNorm fine-tuned out.
9
 
10
  The model was fine-tuned on OpenWebText for ~500M tokens (1000 iterations of batch size ~488 at 1024 context length) while gradually disableing LayerNorm layers.
11
- For details see [here](https://www.lesswrong.com/posts/THzcKKQd4oWkg4dSP/you-can-remove-gpt2-s-layernorm-by-fine-tuning-for-an-hour) and the upcoming paper.
12
 
13
- There are 5 similar models available (v1 through v5) trained with different fine-tuning schedules. Please refer to the [paper](https://arxiv.org/abs/2409.13710)
14
  for details; the training code is available [here](https://github.com/ApolloResearch/gpt2_noLN). The best model (v4) is the default as of 6th September 2024 (previously v2 was the default).
15
 
16
  The model is a `GPT2LMHeadModel` (to avoid requiring `trust_remote_code`) which technically contains LayerNorm blocks.
 
8
  This is a gpt2-small model with LayerNorm fine-tuned out.
9
 
10
  The model was fine-tuned on OpenWebText for ~500M tokens (1000 iterations of batch size ~488 at 1024 context length) while gradually disableing LayerNorm layers.
 
11
 
12
+ There are 5 similar models available (v1 through v5) trained with different fine-tuning schedules. Please refer to the [paper](https://arxiv.org/abs/2409.13710) or [blog post](https://www.lesswrong.com/posts/THzcKKQd4oWkg4dSP/you-can-remove-gpt2-s-layernorm-by-fine-tuning-for-an-hour)
13
  for details; the training code is available [here](https://github.com/ApolloResearch/gpt2_noLN). The best model (v4) is the default as of 6th September 2024 (previously v2 was the default).
14
 
15
  The model is a `GPT2LMHeadModel` (to avoid requiring `trust_remote_code`) which technically contains LayerNorm blocks.