Mask-Adapter / GETTING_STARTED.md
wondervictor's picture
Upload 10 files
f773839 verified
|
raw
history blame
4.03 kB

Getting Started with Mask-Adapter

This document provides a brief intro of the usage of Mask-Adapter.

Please see Getting Started with Detectron2 for full usage.

Inference Demo with Pre-trained Models

We provide demo.py that is able to demo builtin configs. Run it with:

cd demo/
python demo.py \
  --input input1.jpg input2.jpg \
  [--other-options]
  --opts MODEL.WEIGHTS /path/to/checkpoint_file

The configs are made for training, therefore we need to specify MODEL.WEIGHTS to a model from model zoo for evaluation. This command will run the inference and show visualizations in an OpenCV window.

For details of the command line arguments, see demo.py -h or look at its source code to understand its behavior. Some common arguments are:

  • To run on your webcam, replace --input files with --webcam.
  • To run on a video, replace --input files with --video-input video.mp4.
  • To run on cpu, add MODEL.DEVICE cpu after --opts.
  • To save outputs to a directory (for images) or a file (for webcam or video), use --output.

Ground-truth Warmup Training

We provide the script train_net_maskadapter.py to train the mask-adapter using ground-truth masks.To train a model with train_net_maskadapter.py, first set up the corresponding datasets as described in datasets/README.md , and then run the following command:

python train_net_maskadapter.py --num-gpus 4 \
  --config-file configs/ground-truth-warmup/mask-adapter/mask_adapter_convnext_large_cocopan_eval_ade20k.yaml

For the MAFTP model, run:

python train_net_maskadapter.py --num-gpus 4 \
  --config-file configs/ground-truth-warmup/mask-adapter/mask_adapter_maft_convnext_large_cocostuff_eval_ade20k.yaml \
  MODEL.WEIGHTS /path/to/maftp_l.pth

The configurations are set for 4-GPU training. Since we use the ADAMW optimizer, it is unclear how to scale the learning rate with batch size. If training with a single GPU, you will need to manually adjust the learning rate and batch size:

python train_net_maskadapter.py \
  --config-file configs/ground-truth-warmup/mask-adapter/mask_adapter_convnext_large_cocopan_eval_ade20k.yaml \
  --num-gpus 1 SOLVER.IMS_PER_BATCH SET_TO_SOME_REASONABLE_VALUE SOLVER.BASE_LR SET_TO_SOME_REASONABLE_VALUE

Combining Mask-Adapter Weights with Mask2Former

Since the ground-truth warmup phase for training the mask-adapter does not involve training Mask2Former, the weights obtained in the first phase will not include Mask2Former weights. To combine the weights, run the following command:

python tools/weight_fuse.py \
  --model_first_phase_path /path/to/first_phase.pth \
  --model_sem_seg_path /path/to/maftp_l.pth \
  --output_path /path/to/maftp_l_withadapter.pth

Mixed-Masks Training

For the mixed-masks training phase, we provide two scripts: train_net_fcclip.py and train_net_maftp.py, which train the mask-adapter for FC-CLIP and MAFTP models, respectively. These two models use different backbones (CLIP) and training source data. For FC-CLIP, run:

python train_net_fcclip.py --num-gpus 4 \
  --config-file configs/mixed-mask-training/fc-clip/fcclip/fcclip_convnext_large_eval_ade20k.yaml MODEL.WEIGHTS /path/to/checkpoint_file

For MAFTP, run:

python train_net_maftp.py --num-gpus 4 \
  --config-file configs/mixed-mask-training/maftp/semantic/train_semantic_large_eval_a150.yaml MODEL.WEIGHTS /path/to/checkpoint_file

To evaluate a model’s performance, for FC-CLIP, use:

python train_net_fcclip.py \
  --config-file configs/mixed-mask-training/fc-clip/fcclip/fcclip_convnext_large_eval_ade20k.yaml \
  --eval-only MODEL.WEIGHTS /path/to/checkpoint_file

For MAFTP, use:

python train_net_maftp.py \
  --config-file configs/mixed-mask-training/maftp/semantic/train_semantic_large_eval_a150.yaml \
  --eval-only MODEL.WEIGHTS /path/to/checkpoint_file