# MViTv2: Improved Multiscale Vision Transformers for Classification and Detection Yanghao Li*, Chao-Yuan Wu*, Haoqi Fan, Karttikeya Mangalam, Bo Xiong, Jitendra Malik, Christoph Feichtenhofer* [[`arXiv`](https://arxiv.org/abs/2112.01526)] [[`BibTeX`](#CitingMViTv2)] In this repository, we provide detection configs and models for MViTv2 (CVPR 2022) in Detectron2. For image classification tasks, please refer to [MViTv2 repo](https://github.com/facebookresearch/mvit). ## Results and Pretrained Models ### COCO
Name pre-train Method epochs box
AP
mask
AP
#params FLOPS model id download
MViTV2-T IN1K Mask R-CNN 36 48.3 43.8 44M 279G 307611773 model
MViTV2-T IN1K Cascade Mask R-CNN 36 52.2 45.0 76M 701G 308344828 model
MViTV2-S IN1K Cascade Mask R-CNN 36 53.2 46.0 87M 748G 308344647 model
MViTV2-B IN1K Cascade Mask R-CNN 36 54.1 46.7 103M 814G 308109448 model
MViTV2-B IN21K Cascade Mask R-CNN 36 54.9 47.4 103M 814G 309003202 model
MViTV2-L IN21K Cascade Mask R-CNN 50 55.8 48.3 270M 1519G 308099658 model
MViTV2-H IN21K Cascade Mask R-CNN 36 56.1 48.5 718M 3084G 309013744 model
Note that the above models were trained and measured on 8-node with 64 NVIDIA A100 GPUs in total. The ImageNet pre-trained model weights are obtained from [MViTv2 repo](https://github.com/facebookresearch/mvit). ## Training All configs can be trained with: ``` ../../tools/lazyconfig_train_net.py --config-file configs/path/to/config.py ``` By default, we use 64 GPUs with batch size as 64 for training. ## Evaluation Model evaluation can be done similarly: ``` ../../tools/lazyconfig_train_net.py --config-file configs/path/to/config.py --eval-only train.init_checkpoint=/path/to/model_checkpoint ``` ## Citing MViTv2 If you use MViTv2, please use the following BibTeX entry. ```BibTeX @inproceedings{li2021improved, title={MViTv2: Improved multiscale vision transformers for classification and detection}, author={Li, Yanghao and Wu, Chao-Yuan and Fan, Haoqi and Mangalam, Karttikeya and Xiong, Bo and Malik, Jitendra and Feichtenhofer, Christoph}, booktitle={CVPR}, year={2022} } ```