# P2PNet (ICCV2021 Oral Presentation) This repository contains codes for the official implementation in PyTorch of **P2PNet** as described in [Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework](https://arxiv.org/abs/2107.12746). A brief introduction of P2PNet can be found at [机器之心 (almosthuman)](https://mp.weixin.qq.com/s?__biz=MzA3MzI4MjgzMw==&mid=2650827826&idx=3&sn=edd3d66444130fb34a59d08fab618a9e&chksm=84e5a84cb392215a005a3b3424f20a9d24dc525dcd933960035bf4b6aa740191b5ecb2b7b161&mpshare=1&scene=1&srcid=1004YEOC7HC9daYRYeUio7Xn&sharer_sharetime=1633675738338&sharer_shareid=7d375dccd3b2f9eec5f8b27ee7c04883&version=3.1.16.5505&platform=win#rd). The codes is tested with PyTorch 1.5.0. It may not run with other versions. ## Visualized demos for P2PNet ## The network The overall architecture of the P2PNet. Built upon the VGG16, it firstly introduce an upsampling path to obtain fine-grained feature map. Then it exploits two branches to simultaneously predict a set of point proposals and their confidence scores. ## Comparison with state-of-the-art methods The P2PNet achieved state-of-the-art performance on several challenging datasets with various densities. | Methods | Venue | SHTechPartA
MAE/MSE |SHTechPartB
MAE/MSE | UCF_CC_50
MAE/MSE | UCF_QNRF
MAE/MSE | |:----:|:----:|:----:|:----:|:----:|:----:| CAN | CVPR'19 | 62.3/100.0 | 7.8/12.2 | 212.2/**243.7** | 107.0/183.0 | Bayesian+ | ICCV'19 | 62.8/101.8 | 7.7/12.7 | 229.3/308.2 | 88.7/154.8 | S-DCNet | ICCV'19 | 58.3/95.0 | 6.7/10.7 | 204.2/301.3 | 104.4/176.1 | SANet+SPANet | ICCV'19 | 59.4/92.5 | 6.5/**9.9** | 232.6/311.7 | -/- | DUBNet | AAAI'20 | 64.6/106.8 | 7.7/12.5 | 243.8/329.3 | 105.6/180.5 | SDANet | AAAI'20 | 63.6/101.8 | 7.8/10.2 | 227.6/316.4 | -/- | ADSCNet | CVPR'20 | 55.4/97.7 | 6.4/11.3 | 198.4/267.3 | **71.3**/**132.5**| ASNet | CVPR'20 | 57.78/90.13 | -/- | 174.84/251.63 | 91.59/159.71 | AMRNet | ECCV'20 | 61.59/98.36 | 7.02/11.00 | 184.0/265.8 | 86.6/152.2 | AMSNet | ECCV'20 | 56.7/93.4 | 6.7/10.2 | 208.4/297.3 | 101.8/163.2| DM-Count | NeurIPS'20 | 59.7/95.7 | 7.4/11.8 | 211.0/291.5 | 85.6/148.3| **Ours** |- | **52.74**/**85.06** | **6.25**/**9.9** | **172.72**/256.18 | 85.32/154.5 | Comparison on the [NWPU-Crowd](https://www.crowdbenchmark.com/resultdetail.html?rid=81) dataset. | Methods | MAE[O] |MSE[O] | MAE[L] | MAE[S] | |:----:|:----:|:----:|:----:|:----:| MCNN | 232.5|714.6 | 220.9|1171.9 | SANet | 190.6 | 491.4 | 153.8 | 716.3| CSRNet | 121.3 | 387.8 | 112.0 | 522.7 | PCC-Net | 112.3 | 457.0 | 111.0 | 777.6 | CANNet | 110.0 | 495.3 | 102.3 | 718.3| Bayesian+ | 105.4 | 454.2 | 115.8 | 750.5 | S-DCNet | 90.2 | 370.5 | **82.9** | 567.8 | DM-Count | 88.4 | 388.6 | 88.0 | **498.0** | **Ours** | **77.44**|**362** | 83.28| 553.92 | The overall performance for both counting and localization. |nAP$_{\delta}$|SHTechPartA| SHTechPartB | UCF_CC_50 | UCF_QNRF | NWPU_Crowd | |:----:|:----:|:----:|:----:|:----:|:----:| $\delta=0.05$ | 10.9\% | 23.8\% | 5.0\% | 5.9\% | 12.9\% | $\delta=0.25$ | 70.3\% | 84.2\% | 54.5\% | 55.4\% | 71.3\% | $\delta=0.50$ | 90.1\% | 94.1\% | 88.1\% | 83.2\% | 89.1\% | $\delta=\{{0.05:0.05:0.50}\}$ | 64.4\% | 76.3\% | 54.3\% | 53.1\% | 65.0\% | Comparison for the localization performance in terms of F1-Measure on NWPU. | Method| F1-Measure |Precision| Recall | |:----:|:----:|:----:|:----:| FasterRCNN | 0.068 | 0.958 | 0.035 | TinyFaces | 0.567 | 0.529 | 0.611 | RAZ | 0.599 | 0.666 | 0.543| Crowd-SDNet | 0.637 | 0.651 | 0.624 | PDRNet | 0.653 | 0.675 | 0.633 | TopoCount | 0.692 | 0.683 | **0.701** | D2CNet | 0.700 | **0.741** | 0.662 | **Ours** |**0.712** | 0.729 | 0.695 | ## Installation * Clone this repo into a directory named P2PNET_ROOT * Organize your datasets as required * Install Python dependencies. We use python 3.6.5 and pytorch 1.5.0 ``` pip install -r requirements.txt ``` ## Organize the counting dataset We use a list file to collect all the images and their ground truth annotations in a counting dataset. When your dataset is organized as recommended in the following, the format of this list file is defined as: ``` train/scene01/img01.jpg train/scene01/img01.txt train/scene01/img02.jpg train/scene01/img02.txt ... train/scene02/img01.jpg train/scene02/img01.txt ``` ### Dataset structures: ``` DATA_ROOT/ |->train/ | |->scene01/ | |->scene02/ | |->... |->test/ | |->scene01/ | |->scene02/ | |->... |->train.list |->test.list ``` DATA_ROOT is your path containing the counting datasets. ### Annotations format For the annotations of each image, we use a single txt file which contains one annotation per line. Note that indexing for pixel values starts at 0. The expected format of each line is: ``` x1 y1 x2 y2 ... ``` ## Training The network can be trained using the `train.py` script. For training on SHTechPartA, use ``` CUDA_VISIBLE_DEVICES=0 python train.py --data_root $DATA_ROOT \ --dataset_file SHHA \ --epochs 3500 \ --lr_drop 3500 \ --output_dir ./logs \ --checkpoints_dir ./weights \ --tensorboard_dir ./logs \ --lr 0.0001 \ --lr_backbone 0.00001 \ --batch_size 8 \ --eval_freq 1 \ --gpu_id 0 ``` By default, a periodic evaluation will be conducted on the validation set. ## Testing A trained model (with an MAE of **51.96**) on SHTechPartA is available at "./weights", run the following commands to launch a visualization demo: ``` CUDA_VISIBLE_DEVICES=0 python run_test.py --weight_path ./weights/SHTechA.pth --output_dir ./logs/ ``` ## Acknowledgements - Part of codes are borrowed from the [C^3 Framework](https://github.com/gjy3035/C-3-Framework). - We refer to [DETR](https://github.com/facebookresearch/detr) to implement our matching strategy. ## Citing P2PNet If you find P2PNet is useful in your project, please consider citing us: ```BibTeX @inproceedings{song2021rethinking, title={Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework}, author={Song, Qingyu and Wang, Changan and Jiang, Zhengkai and Wang, Yabiao and Tai, Ying and Wang, Chengjie and Li, Jilin and Huang, Feiyue and Wu, Yang}, journal={Proceedings of the IEEE/CVF International Conference on Computer Vision}, year={2021} } ``` ## Related works from Tencent Youtu Lab - [AAAI2021] To Choose or to Fuse? Scale Selection for Crowd Counting. ([paper link](https://ojs.aaai.org/index.php/AAAI/article/view/16360) & [codes](https://github.com/TencentYoutuResearch/CrowdCounting-SASNet)) - [ICCV2021] Uniformity in Heterogeneity: Diving Deep into Count Interval Partition for Crowd Counting. ([paper link](https://arxiv.org/abs/2107.12619) & [codes](https://github.com/TencentYoutuResearch/CrowdCounting-UEPNet))