|
This repository contains some of the matrices as described in |
|
|
|
* Alexander Yom Din, Taelin Karidi, Leshem Choshen, Mor Geva. 2023. Jump to Conclusions: Short-Cutting Transformers With Linear Transformations. ([arXiv:2303.09435](https://arxiv.org/abs/2303.09435)) |
|
|
|
please cite the paper as: |
|
|
|
```bibtex |
|
@article{din2023jump, |
|
title={Jump to Conclusions: Short-Cutting Transformers With Linear Transformations}, |
|
author={Yom Din, Alexander and Karidi, Taelin and Choshen, Leshem and Geva, Mor}, |
|
journal={arXiv preprint arXiv:2303.09435}, |
|
year={2023}, |
|
} |
|
``` |
|
|
|
For example, the file in `gpt2-medium/wikipedia/6_9.pickle` contains the matrix trained, on the wikipedia dataset, to transform 6th-layer hidden representations of tokens into 9th-layer hidden representations, for the Huggingface transformers `gpt2-medium` model. One loads and multiplies as follows: |
|
|
|
``` |
|
import pickle |
|
import torch |
|
|
|
with open(file_name, 'rb') as f: |
|
mat = pickle.load(f) |
|
|
|
assert(isinstance(mat, torch.Tensor)) |
|
assert(len(mat.shape) == 2) |
|
assert(mat.shape[0] == mat.shape[1]) |
|
|
|
v = torch.rand(mat.shape[1]) |
|
|
|
w = mat @ v |
|
|
|
assert(w.shape == v.shape) |
|
``` |
|
|
|
Some more information is in [https://github.com/sashayd/mat](https://github.com/sashayd/mat). |