160 intermediary checkpoints from the tr1-13B training | |
these models have a bug in them. While we are fixing things if you try to use any of these please run it through this script: | |
``` | |
python -c ' | |
import sys, torch | |
f=sys.argv[1] | |
sd=torch.load(f) | |
d=2048 | |
for k in sd.keys(): | |
if k.endswith(".attn.bias"): | |
sd[k] = torch.tril(torch.ones((d, d), dtype=torch.float16)).view(1, 1, d, d) | |
torch.save(sd, f) | |
' global_step594/pytorch_model.bin | |
``` |