sashay
/

linear-shortcut

Model card Files Files and versions

sashay commited on Apr 24, 2023

Commit

be9fd6a

·

1 Parent(s): 1393898

Create README.md

Files changed (1) hide show

README.md +39 -0

README.md ADDED Viewed

	@@ -0,0 +1,39 @@

+This repository contains some of the matrices as described in
+* Alexander Yom Din, Taelin Karidi, Leshem Choshen, Mor Geva. 2023. Jump to Conclusions: Short-Cutting Transformers With Linear Transformations. ([arXiv:2303.09435](https://arxiv.org/abs/2303.09435))
+please cite the paper as:
+```bibtex
+@article{din2023jump,
+      title={Jump to Conclusions: Short-Cutting Transformers With Linear Transformations},
+      author={Yom Din, Alexander and Karidi, Taelin and Choshen, Leshem and Geva, Mor},
+      journal={arXiv preprint arXiv:2303.09435},
+      year={2023},
+}
+```
+For example, the file in `gpt2-medium/wikipedia/6_9.pickle` contains the matrix trained to transform 6th-layer hidden representations of tokens into 9th-layer hidden representations, for the Huggingface transformers `gpt2-medium` model. One loads and multiplies as follows:
+```
+import pickle
+import torch
+def mul(mat, v):
+    return (mat @ v[..., None]).squeeze(-1)
+with open(file_name, 'rb') as f:
+    mat = pickle.load(f)
+assert(isinstance(mat, torch.Tensor))
+assert(len(mat.shape) == 2)
+assert(mat.shape[0] == mat.shape[1])
+v = torch.rand(mat.shape[1])
+w = mul(mat, v)
+assert(w.shape == v.shape)
+```