# m2mKD This repository contains the checkpoints for [m2mKD: Module-to-Module Knowledge Distillation for Modular Transformers](https://arxiv.org/abs/2402.16918). ## Released checkpoints For the usage of the checkpoints listed below, please refer to the instructions provided on our [GitHub repo](https://github.com/kamanphoebe/m2mKD). - `nac_scale_tinyimnet.pth`/`nac_scale_imnet.pth`: NAC model with a scale-free prior trained using m2mKD. - `vmoe_base.pth`: V-MoE-Base model trained using m2mKD. - `FT_huge`: a directory containing DeiT-Huge teacher modules for NAC model training. - `nac_tinyimnet_students`: a directory containing NAC student modules for Tiny-ImageNet. ## Acknowledgement Our implementation is mainly based on [Deep-Incubation](https://github.com/LeapLabTHU/Deep-Incubation). ## Citation If you use the checkpoints, please cite our paper: ``` @misc{lo2024m2mkd, title={m2mKD: Module-to-Module Knowledge Distillation for Modular Transformers}, author={Ka Man Lo and Yiming Liang and Wenyu Du and Yuantao Fan and Zili Wang and Wenhao Huang and Lei Ma and Jie Fu}, year={2024}, eprint={2402.16918}, archivePrefix={arXiv}, primaryClass={cs.LG} } ```