Detectron2 Cascade-RCNN with FPN and Group Normalization on ResNext32xd4-50 trained on Publaynet for Document Layout Analysis

The model and has been trained with the Tensorflow training toolkit Tensorpack and then transferred to Pytorch using a conversion script. The Tensorflow and Pytorch models differ slightly (padding ...), however validating both models give a difference of less than 0.03 mAP.

A second model has been added where the Tensorpack model has been used as initial checkpoint and training has been resumed for 20K iterations. Performance of this model is now superior to the Tensorpack model.

Please check: Xu Zhong et. all. - PubLayNet: largest dataset ever for document layout analysis.

This model is different from the model used the paper.

The code has been adapted so that it can be used in a deepdoctection pipeline.

How this model can be used

This model can be used with the deepdoctection in a full pipeline, along with table recognition and OCR. Check the general instruction following this Get_started tutorial.

This is an inference model only

To reduce the size of the checkpoint we removed all variables that are not necessary for inference. Therefore it cannot be used for fine-tuning. To fine tune this model please use Tensorflow, as well as its training script. More information can be found in this this model card.