sashay
/

linear-shortcut

Model card Files Files and versions Community

File size: 1,242 Bytes

be9fd6a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c297b8c
be9fd6a
 
 
 
 
 
 
 
 
 
 
 
 
 
c521b70
be9fd6a
 
62b9039
 
058d3c3

This repository contains some of the matrices as described in

* Alexander Yom Din, Taelin Karidi, Leshem Choshen, Mor Geva. 2023. Jump to Conclusions: Short-Cutting Transformers With Linear Transformations. ([arXiv:2303.09435](https://arxiv.org/abs/2303.09435))

please cite the paper as:

```bibtex
@article{din2023jump,
      title={Jump to Conclusions: Short-Cutting Transformers With Linear Transformations},
      author={Yom Din, Alexander and Karidi, Taelin and Choshen, Leshem and Geva, Mor},
      journal={arXiv preprint arXiv:2303.09435},
      year={2023},
}
```

For example, the file in `gpt2-medium/wikipedia/6_9.pickle` contains the matrix trained, on the wikipedia dataset, to transform 6th-layer hidden representations of tokens into 9th-layer hidden representations, for the Huggingface transformers `gpt2-medium` model. One loads and multiplies as follows:

```
import pickle
import torch

with open(file_name, 'rb') as f:
    mat = pickle.load(f)

assert(isinstance(mat, torch.Tensor))
assert(len(mat.shape) == 2)
assert(mat.shape[0] == mat.shape[1])

v = torch.rand(mat.shape[1])

w = mat @ v

assert(w.shape == v.shape)
```

Some more information is in [https://github.com/sashayd/mat](https://github.com/sashayd/mat).