We use various synthetic and real datasets. More info is in Appendix F of the supplementary material. Some preprocessing scripts are included in [`tools/`](tools). | Dataset | Type | Remarks | |:-------:|:-----:|:--------| | [MJSynth](https://www.robots.ox.ac.uk/~vgg/data/text/) | synthetic | Case-sensitive annotations were extracted from the image filenames | | [SynthText](https://www.robots.ox.ac.uk/~vgg/data/scenetext/) | synthetic | Processed with [`crop_by_word_bb_syn90k.py`](https://github.com/FangShancheng/ABINet/blob/main/tools/crop_by_word_bb_syn90k.py) | | [IC13](https://rrc.cvc.uab.es/?ch=2) | real | Three archives: 857, 1015, 1095 (full) | | [IC15](https://rrc.cvc.uab.es/?ch=4) | real | Two archives: 1811, 2077 (full) | | [CUTE80](http://cs-chan.com/downloads_cute80_dataset.html) | real | \[1\] | | [IIIT5k](https://cvit.iiit.ac.in/research/projects/cvit-projects/the-iiit-5k-word-dataset) | real | \[1\] | | [SVT](http://vision.ucsd.edu/~kai/svt/) | real | \[1\] | | [SVTP](https://openaccess.thecvf.com/content_iccv_2013/html/Phan_Recognizing_Text_with_2013_ICCV_paper.html) | real | \[1\] | | [ArT](https://rrc.cvc.uab.es/?ch=14) | real | \[2\] | | [LSVT](https://rrc.cvc.uab.es/?ch=16) | real | \[2\] | | [MLT19](https://rrc.cvc.uab.es/?ch=15) | real | \[2\] | | [RCTW17](https://rctw.vlrlab.net/dataset.html) | real | \[2\] | | [ReCTS](https://rrc.cvc.uab.es/?ch=12) | real | \[2\] | | [Uber-Text](https://s3-us-west-2.amazonaws.com/uber-common-public/ubertext/index.html) | real | \[2\] | | [COCO-Text v1.4](https://rrc.cvc.uab.es/?ch=5) | real | Processed with [`coco_text_converter.py`](tools/coco_text_converter.py) | | [COCO-Text v2.0](https://bgshih.github.io/cocotext/) | real | Processed with [`coco_2_converter.py`](tools/coco_2_converter.py) | | [OpenVINO](https://proceedings.mlr.press/v157/krylov21a.html) | real | [Annotations](https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text/) for a subset of [Open Images](https://github.com/cvdfoundation/open-images-dataset). Processed with [`openvino_converter.py`](tools/openvino_converter.py). | | [TextOCR](https://textvqa.org/textocr/) | real | Annotations for a subset of Open Images. Processed with [`textocr_converter.py`](tools/textocr_converter.py). A _horizontal_ version can be generated by passing `--rectify_pose`. | \[1\] Case-sensitive annotations from [Long and Yao](https://github.com/Jyouhou/Case-Sensitive-Scene-Text-Recognition-Datasets) + [our corrections](https://github.com/baudm/Case-Sensitive-Scene-Text-Recognition-Datasets). Processed with [case_sensitive_str_datasets_converter.py](tools/case_sensitive_str_datasets_converter.py)
\[2\] Archives used as-is from [Baek et al.](https://github.com/ku21fan/STR-Fewer-Labels/blob/main/data.md) They are included in the dataset release for convenience. Please refer to their work for more info about the datasets. The preprocessed archives are available here: [val + test + most of train](https://drive.google.com/drive/folders/1NYuoi7dfJVgo-zUJogh8UQZgIMpLviOE), [TextOCR + OpenVINO](https://drive.google.com/drive/folders/1D9z_YJVa6f-O0juni-yG5jcwnhvYw-qC) The expected filesystem structure is as follows: ``` data ├── test │ ├── ArT │ ├── COCOv1.4 │ ├── CUTE80 │ ├── IC13_1015 │ ├── IC13_1095 # Full IC13 test set. Typically not used for benchmarking but provided here for convenience. │ ├── IC13_857 │ ├── IC15_1811 │ ├── IC15_2077 │ ├── IIIT5k │ ├── SVT │ ├── SVTP │ └── Uber ├── train │ ├── real │ │ ├── ArT │ │ │ ├── train │ │ │ └── val │ │ ├── COCOv2.0 │ │ │ ├── train │ │ │ └── val │ │ ├── LSVT │ │ │ ├── test │ │ │ ├── train │ │ │ └── val │ │ ├── MLT19 │ │ │ ├── test │ │ │ ├── train │ │ │ └── val │ │ ├── OpenVINO │ │ │ ├── train_1 │ │ │ ├── train_2 │ │ │ ├── train_5 │ │ │ ├── train_f │ │ │ └── validation │ │ ├── RCTW17 │ │ │ ├── test │ │ │ ├── train │ │ │ └── val │ │ ├── ReCTS │ │ │ ├── test │ │ │ ├── train │ │ │ └── val │ │ ├── TextOCR │ │ │ ├── train │ │ │ └── val │ │ └── Uber │ │ ├── train │ │ └── val │ └── synth │ ├── MJ │ │ ├── test │ │ ├── train │ │ └── val │ └── ST └── val ├── IC13 ├── IC15 ├── IIIT5k └── SVT ```