pawandev
first push
bfea304
We use various synthetic and real datasets. More info is in Appendix F of the supplementary material. Some preprocessing scripts are included in [`tools/`](tools).
| Dataset | Type | Remarks |
|:-------:|:-----:|:--------|
| [MJSynth](https://www.robots.ox.ac.uk/~vgg/data/text/) | synthetic | Case-sensitive annotations were extracted from the image filenames |
| [SynthText](https://www.robots.ox.ac.uk/~vgg/data/scenetext/) | synthetic | Processed with [`crop_by_word_bb_syn90k.py`](https://github.com/FangShancheng/ABINet/blob/main/tools/crop_by_word_bb_syn90k.py) |
| [IC13](https://rrc.cvc.uab.es/?ch=2) | real | Three archives: 857, 1015, 1095 (full) |
| [IC15](https://rrc.cvc.uab.es/?ch=4) | real | Two archives: 1811, 2077 (full) |
| [CUTE80](http://cs-chan.com/downloads_cute80_dataset.html) | real | \[1\] |
| [IIIT5k](https://cvit.iiit.ac.in/research/projects/cvit-projects/the-iiit-5k-word-dataset) | real | \[1\] |
| [SVT](http://vision.ucsd.edu/~kai/svt/) | real | \[1\] |
| [SVTP](https://openaccess.thecvf.com/content_iccv_2013/html/Phan_Recognizing_Text_with_2013_ICCV_paper.html) | real | \[1\] |
| [ArT](https://rrc.cvc.uab.es/?ch=14) | real | \[2\] |
| [LSVT](https://rrc.cvc.uab.es/?ch=16) | real | \[2\] |
| [MLT19](https://rrc.cvc.uab.es/?ch=15) | real | \[2\] |
| [RCTW17](https://rctw.vlrlab.net/dataset.html) | real | \[2\] |
| [ReCTS](https://rrc.cvc.uab.es/?ch=12) | real | \[2\] |
| [Uber-Text](https://s3-us-west-2.amazonaws.com/uber-common-public/ubertext/index.html) | real | \[2\] |
| [COCO-Text v1.4](https://rrc.cvc.uab.es/?ch=5) | real | Processed with [`coco_text_converter.py`](tools/coco_text_converter.py) |
| [COCO-Text v2.0](https://bgshih.github.io/cocotext/) | real | Processed with [`coco_2_converter.py`](tools/coco_2_converter.py) |
| [OpenVINO](https://proceedings.mlr.press/v157/krylov21a.html) | real | [Annotations](https://storage.openvinotoolkit.org/repositories/openvino_training_extensions/datasets/open_images_v5_text/) for a subset of [Open Images](https://github.com/cvdfoundation/open-images-dataset). Processed with [`openvino_converter.py`](tools/openvino_converter.py). |
| [TextOCR](https://textvqa.org/textocr/) | real | Annotations for a subset of Open Images. Processed with [`textocr_converter.py`](tools/textocr_converter.py). A _horizontal_ version can be generated by passing `--rectify_pose`. |
\[1\] Case-sensitive annotations from [Long and Yao](https://github.com/Jyouhou/Case-Sensitive-Scene-Text-Recognition-Datasets) + [our corrections](https://github.com/baudm/Case-Sensitive-Scene-Text-Recognition-Datasets). Processed with [case_sensitive_str_datasets_converter.py](tools/case_sensitive_str_datasets_converter.py)<br/>
\[2\] Archives used as-is from [Baek et al.](https://github.com/ku21fan/STR-Fewer-Labels/blob/main/data.md) They are included in the dataset release for convenience. Please refer to their work for more info about the datasets.
The preprocessed archives are available here: [val + test + most of train](https://drive.google.com/drive/folders/1NYuoi7dfJVgo-zUJogh8UQZgIMpLviOE), [TextOCR + OpenVINO](https://drive.google.com/drive/folders/1D9z_YJVa6f-O0juni-yG5jcwnhvYw-qC)
The expected filesystem structure is as follows:
```
data
β”œβ”€β”€ test
β”‚ β”œβ”€β”€ ArT
β”‚ β”œβ”€β”€ COCOv1.4
β”‚ β”œβ”€β”€ CUTE80
β”‚ β”œβ”€β”€ IC13_1015
β”‚ β”œβ”€β”€ IC13_1095 # Full IC13 test set. Typically not used for benchmarking but provided here for convenience.
β”‚ β”œβ”€β”€ IC13_857
β”‚ β”œβ”€β”€ IC15_1811
β”‚ β”œβ”€β”€ IC15_2077
β”‚ β”œβ”€β”€ IIIT5k
β”‚ β”œβ”€β”€ SVT
β”‚ β”œβ”€β”€ SVTP
β”‚ └── Uber
β”œβ”€β”€ train
β”‚ β”œβ”€β”€ real
β”‚ β”‚ β”œβ”€β”€ ArT
β”‚ β”‚ β”‚ β”œβ”€β”€ train
β”‚ β”‚ β”‚ └── val
β”‚ β”‚ β”œβ”€β”€ COCOv2.0
β”‚ β”‚ β”‚ β”œβ”€β”€ train
β”‚ β”‚ β”‚ └── val
β”‚ β”‚ β”œβ”€β”€ LSVT
β”‚ β”‚ β”‚ β”œβ”€β”€ test
β”‚ β”‚ β”‚ β”œβ”€β”€ train
β”‚ β”‚ β”‚ └── val
β”‚ β”‚ β”œβ”€β”€ MLT19
β”‚ β”‚ β”‚ β”œβ”€β”€ test
β”‚ β”‚ β”‚ β”œβ”€β”€ train
β”‚ β”‚ β”‚ └── val
β”‚ β”‚ β”œβ”€β”€ OpenVINO
β”‚ β”‚ β”‚ β”œβ”€β”€ train_1
β”‚ β”‚ β”‚ β”œβ”€β”€ train_2
β”‚ β”‚ β”‚ β”œβ”€β”€ train_5
β”‚ β”‚ β”‚ β”œβ”€β”€ train_f
β”‚ β”‚ β”‚ └── validation
β”‚ β”‚ β”œβ”€β”€ RCTW17
β”‚ β”‚ β”‚ β”œβ”€β”€ test
β”‚ β”‚ β”‚ β”œβ”€β”€ train
β”‚ β”‚ β”‚ └── val
β”‚ β”‚ β”œβ”€β”€ ReCTS
β”‚ β”‚ β”‚ β”œβ”€β”€ test
β”‚ β”‚ β”‚ β”œβ”€β”€ train
β”‚ β”‚ β”‚ └── val
β”‚ β”‚ β”œβ”€β”€ TextOCR
β”‚ β”‚ β”‚ β”œβ”€β”€ train
β”‚ β”‚ β”‚ └── val
β”‚ β”‚ └── Uber
β”‚ β”‚ β”œβ”€β”€ train
β”‚ β”‚ └── val
β”‚ └── synth
β”‚ β”œβ”€β”€ MJ
β”‚ β”‚ β”œβ”€β”€ test
β”‚ β”‚ β”œβ”€β”€ train
β”‚ β”‚ └── val
β”‚ └── ST
└── val
β”œβ”€β”€ IC13
β”œβ”€β”€ IC15
β”œβ”€β”€ IIIT5k
└── SVT
```