stefanhex-apollo commited on
Commit
b338000
·
verified ·
1 Parent(s): 8fa9a4f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -10,7 +10,7 @@ This is a gpt2-small model with LayerNorm fine-tuned out.
10
  The model was fine-tuned on OpenWebText for ~500M tokens (1000 iterations of batch size ~488 at 1024 context length) while gradually disableing LayerNorm layers.
11
  For details see [here](https://www.lesswrong.com/posts/THzcKKQd4oWkg4dSP/you-can-remove-gpt2-s-layernorm-by-fine-tuning-for-an-hour) and the upcoming paper.
12
 
13
- There are 5 similar models available (v1 through v5) trained with different fine-tuning schedules. Please refer to the paper
14
  for details. The best model (v4) is the default as of 6th September 2024 (previously v2 was the default).
15
 
16
  The model is a `GPT2LMHeadModel` (to avoid requiring `trust_remote_code`) which technically contains LayerNorm blocks.
 
10
  The model was fine-tuned on OpenWebText for ~500M tokens (1000 iterations of batch size ~488 at 1024 context length) while gradually disableing LayerNorm layers.
11
  For details see [here](https://www.lesswrong.com/posts/THzcKKQd4oWkg4dSP/you-can-remove-gpt2-s-layernorm-by-fine-tuning-for-an-hour) and the upcoming paper.
12
 
13
+ There are 5 similar models available (v1 through v5) trained with different fine-tuning schedules. Please refer to the [paper](https://publications.apolloresearch.ai/remove_layer_norm.pdf)
14
  for details. The best model (v4) is the default as of 6th September 2024 (previously v2 was the default).
15
 
16
  The model is a `GPT2LMHeadModel` (to avoid requiring `trust_remote_code`) which technically contains LayerNorm blocks.