sashay
/

linear-shortcut

Model card Files Files and versions Community

linear-shortcut / README.md

sashay's picture

Update README.md

058d3c3 about 2 years ago

|

history blame contribute delete

1.24 kB

	This repository contains some of the matrices as described in

	* Alexander Yom Din, Taelin Karidi, Leshem Choshen, Mor Geva. 2023. Jump to Conclusions: Short-Cutting Transformers With Linear Transformations. ([arXiv:2303.09435](https://arxiv.org/abs/2303.09435))

	please cite the paper as:

	```bibtex
	@article{din2023jump,
	title={Jump to Conclusions: Short-Cutting Transformers With Linear Transformations},
	author={Yom Din, Alexander and Karidi, Taelin and Choshen, Leshem and Geva, Mor},
	journal={arXiv preprint arXiv:2303.09435},
	year={2023},
	}
	```

	For example, the file in `gpt2-medium/wikipedia/6_9.pickle` contains the matrix trained, on the wikipedia dataset, to transform 6th-layer hidden representations of tokens into 9th-layer hidden representations, for the Huggingface transformers `gpt2-medium` model. One loads and multiplies as follows:

	```
	import pickle
	import torch

	with open(file_name, 'rb') as f:
	mat = pickle.load(f)

	assert(isinstance(mat, torch.Tensor))
	assert(len(mat.shape) == 2)
	assert(mat.shape[0] == mat.shape[1])

	v = torch.rand(mat.shape[1])

	w = mat @ v

	assert(w.shape == v.shape)
	```

	Some more information is in [https://github.com/sashayd/mat](https://github.com/sashayd/mat).