File size: 1,242 Bytes
be9fd6a c297b8c be9fd6a c521b70 be9fd6a 62b9039 058d3c3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
This repository contains some of the matrices as described in
* Alexander Yom Din, Taelin Karidi, Leshem Choshen, Mor Geva. 2023. Jump to Conclusions: Short-Cutting Transformers With Linear Transformations. ([arXiv:2303.09435](https://arxiv.org/abs/2303.09435))
please cite the paper as:
```bibtex
@article{din2023jump,
title={Jump to Conclusions: Short-Cutting Transformers With Linear Transformations},
author={Yom Din, Alexander and Karidi, Taelin and Choshen, Leshem and Geva, Mor},
journal={arXiv preprint arXiv:2303.09435},
year={2023},
}
```
For example, the file in `gpt2-medium/wikipedia/6_9.pickle` contains the matrix trained, on the wikipedia dataset, to transform 6th-layer hidden representations of tokens into 9th-layer hidden representations, for the Huggingface transformers `gpt2-medium` model. One loads and multiplies as follows:
```
import pickle
import torch
with open(file_name, 'rb') as f:
mat = pickle.load(f)
assert(isinstance(mat, torch.Tensor))
assert(len(mat.shape) == 2)
assert(mat.shape[0] == mat.shape[1])
v = torch.rand(mat.shape[1])
w = mat @ v
assert(w.shape == v.shape)
```
Some more information is in [https://github.com/sashayd/mat](https://github.com/sashayd/mat). |