Spaces:
Build error
Build error
Create new file
#1
by
captchaboy
- opened
README.md
CHANGED
@@ -1,136 +1,11 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
## Runtime Environment
|
13 |
-
```
|
14 |
-
pip install -r requirements.txt
|
15 |
-
```
|
16 |
-
Note: `fastai==1.0.60` is required.
|
17 |
-
|
18 |
-
## Datasets
|
19 |
-
<details>
|
20 |
-
<summary>Training datasets (Click to expand) </summary>
|
21 |
-
1. [MJSynth](http://www.robots.ox.ac.uk/~vgg/data/text/) (MJ):
|
22 |
-
- Use `tools/create_lmdb_dataset.py` to convert images into LMDB dataset
|
23 |
-
- [LMDB dataset BaiduNetdisk(passwd:n23k)](https://pan.baidu.com/s/1mgnTiyoR8f6Cm655rFI4HQ)
|
24 |
-
2. [SynthText](http://www.robots.ox.ac.uk/~vgg/data/scenetext/) (ST):
|
25 |
-
- Use `tools/crop_by_word_bb.py` to crop images from original [SynthText](http://www.robots.ox.ac.uk/~vgg/data/scenetext/) dataset, and convert images into LMDB dataset by `tools/create_lmdb_dataset.py`
|
26 |
-
- [LMDB dataset BaiduNetdisk(passwd:n23k)](https://pan.baidu.com/s/1mgnTiyoR8f6Cm655rFI4HQ)
|
27 |
-
3. [WikiText103](https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-v1.zip), which is only used for pre-trainig language models:
|
28 |
-
- Use `notebooks/prepare_wikitext103.ipynb` to convert text into CSV format.
|
29 |
-
- [CSV dataset BaiduNetdisk(passwd:dk01)](https://pan.baidu.com/s/1yabtnPYDKqhBb_Ie9PGFXA)
|
30 |
-
</details>
|
31 |
-
|
32 |
-
<details>
|
33 |
-
<summary>Evaluation datasets (Click to expand) </summary>
|
34 |
-
- Evaluation datasets, LMDB datasets can be downloaded from [BaiduNetdisk(passwd:1dbv)](https://pan.baidu.com/s/1RUg3Akwp7n8kZYJ55rU5LQ), [GoogleDrive](https://drive.google.com/file/d/1dTI0ipu14Q1uuK4s4z32DqbqF3dJPdkk/view?usp=sharing).
|
35 |
-
1. ICDAR 2013 (IC13)
|
36 |
-
2. ICDAR 2015 (IC15)
|
37 |
-
3. IIIT5K Words (IIIT)
|
38 |
-
4. Street View Text (SVT)
|
39 |
-
5. Street View Text-Perspective (SVTP)
|
40 |
-
6. CUTE80 (CUTE)
|
41 |
-
</details>
|
42 |
-
|
43 |
-
<details>
|
44 |
-
<summary>The structure of `data` directory (Click to expand) </summary>
|
45 |
-
- The structure of `data` directory is
|
46 |
-
```
|
47 |
-
data
|
48 |
-
βββ charset_36.txt
|
49 |
-
βββ evaluation
|
50 |
-
βΒ Β βββ CUTE80
|
51 |
-
βΒ Β βββ IC13_857
|
52 |
-
βΒ Β βββ IC15_1811
|
53 |
-
βΒ Β βββ IIIT5k_3000
|
54 |
-
βΒ Β βββ SVT
|
55 |
-
βΒ Β βββ SVTP
|
56 |
-
βββ training
|
57 |
-
βΒ Β βββ MJ
|
58 |
-
βΒ Β βΒ Β βββ MJ_test
|
59 |
-
βΒ Β βΒ Β βββ MJ_train
|
60 |
-
βΒ Β βΒ Β βββ MJ_valid
|
61 |
-
βΒ Β βββ ST
|
62 |
-
βββ WikiText-103.csv
|
63 |
-
βββ WikiText-103_eval_d1.csv
|
64 |
-
```
|
65 |
-
</details>
|
66 |
-
|
67 |
-
## Pretrained Models
|
68 |
-
|
69 |
-
Get the pretrained models from [GoogleDrive](https://drive.google.com/drive/folders/1C8NMI8Od8mQUMlsnkHNLkYj73kbAQ7Bl?usp=sharing). Performances of the pretrained models are summaried as follows:
|
70 |
-
|
71 |
-
|Model|IC13|SVT|IIIT|IC15|SVTP|CUTE|AVG|
|
72 |
-
|-|-|-|-|-|-|-|-|
|
73 |
-
|IterNet|97.9|95.1|96.9|87.7|90.9|91.3|93.8|
|
74 |
-
|
75 |
-
## Training
|
76 |
-
|
77 |
-
1. Pre-train vision model
|
78 |
-
```
|
79 |
-
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python main.py --config=configs/pretrain_vm.yaml
|
80 |
-
```
|
81 |
-
2. Pre-train language model
|
82 |
-
```
|
83 |
-
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config=configs/pretrain_language_model.yaml
|
84 |
-
```
|
85 |
-
3. Train IterNet
|
86 |
-
```
|
87 |
-
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python main.py --config=configs/train_iternet.yaml
|
88 |
-
```
|
89 |
-
Note:
|
90 |
-
- You can set the `checkpoint` path for vision model (vm) and language model separately for specific pretrained model, or set to `None` to train from scratch
|
91 |
-
|
92 |
-
|
93 |
-
## Evaluation
|
94 |
-
|
95 |
-
```
|
96 |
-
CUDA_VISIBLE_DEVICES=0 python main.py --config=configs/train_iternet.yaml --phase test --image_only
|
97 |
-
```
|
98 |
-
Additional flags:
|
99 |
-
- `--checkpoint /path/to/checkpoint` set the path of evaluation model
|
100 |
-
- `--test_root /path/to/dataset` set the path of evaluation dataset
|
101 |
-
- `--model_eval [alignment|vision]` which sub-model to evaluate
|
102 |
-
- `--image_only` disable dumping visualization of attention masks
|
103 |
-
|
104 |
-
## Run Demo
|
105 |
-
[<a href="https://colab.research.google.com/drive/1XmZGJzFF95uafmARtJMudPLLKBO2eXLv?usp=sharing"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="google colab logo"></a>](https://colab.research.google.com/drive/1XmZGJzFF95uafmARtJMudPLLKBO2eXLv?usp=sharing)
|
106 |
-
|
107 |
-
```
|
108 |
-
python demo.py --config=configs/train_iternet.yaml --input=figures/demo
|
109 |
-
```
|
110 |
-
Additional flags:
|
111 |
-
- `--config /path/to/config` set the path of configuration file
|
112 |
-
- `--input /path/to/image-directory` set the path of image directory or wildcard path, e.g, `--input='figs/test/*.png'`
|
113 |
-
- `--checkpoint /path/to/checkpoint` set the path of trained model
|
114 |
-
- `--cuda [-1|0|1|2|3...]` set the cuda id, by default -1 is set and stands for cpu
|
115 |
-
- `--model_eval [alignment|vision]` which sub-model to use
|
116 |
-
- `--image_only` disable dumping visualization of attention masks
|
117 |
-
|
118 |
-
|
119 |
-
## Citation
|
120 |
-
If you find our method useful for your reserach, please cite
|
121 |
-
```bash
|
122 |
-
@article{chu2022itervm,
|
123 |
-
title={IterVM: Iterative Vision Modeling Module for Scene Text Recognition},
|
124 |
-
author={Chu, Xiaojie and Wang, Yongtao},
|
125 |
-
journal={arXiv preprint arXiv:2204.02630},
|
126 |
-
year={2022}
|
127 |
-
}
|
128 |
-
```
|
129 |
-
|
130 |
-
## License
|
131 |
-
The project is only free for academic research purposes, but needs authorization for commerce. For commerce permission, please contact [email protected].
|
132 |
-
|
133 |
-
## Acknowledgements
|
134 |
-
This project is based on [ABINet](https://github.com/FangShancheng/ABINet.git).
|
135 |
-
Thanks for their great works.
|
136 |
-
|
|
|
1 |
+
---
|
2 |
+
title: Pixelplanet OCR
|
3 |
+
emoji: π
|
4 |
+
colorFrom: indigo
|
5 |
+
colorTo: red
|
6 |
+
sdk: gradio
|
7 |
+
sdk_version: 2.8.12
|
8 |
+
app_file: app.py
|
9 |
+
pinned: false
|
10 |
+
license: bsd
|
11 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|