File size: 7,093 Bytes
97b6013 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 |
## Attention-based Extraction of Structured Information from Street View Imagery
[](https://paperswithcode.com/sota/optical-character-recognition-on-fsns-test?p=attention-based-extraction-of-structured)
[](https://arxiv.org/abs/1704.03549)
[](https://github.com/tensorflow/tensorflow/releases/tag/v1.15.0)
*A TensorFlow model for real-world image text extraction problems.*
This folder contains the code needed to train a new Attention OCR model on the
[FSNS dataset][FSNS] dataset to transcribe street names in France. You can
also use it to train it on your own data.
More details can be found in our paper:
["Attention-based Extraction of Structured Information from Street View
Imagery"](https://arxiv.org/abs/1704.03549)
## Contacts
Authors
* Zbigniew Wojna ([email protected])
* Alexander Gorban ([email protected])
Maintainer: Xavier Gibert [@xavigibert](https://github.com/xavigibert)
## Requirements
1. Install the TensorFlow library ([instructions][TF]). For example:
```
python3 -m venv ~/.tensorflow
source ~/.tensorflow/bin/activate
pip install --upgrade pip
pip install --upgrade tensorflow-gpu=1.15
```
2. At least 158GB of free disk space to download the FSNS dataset:
```
cd research/attention_ocr/python/datasets
aria2c -c -j 20 -i ../../../street/python/fsns_urls.txt
cd ..
```
3. 16GB of RAM or more; 32GB is recommended.
4. `train.py` works with both CPU and GPU, though using GPU is preferable. It has been tested with a Titan X and with a GTX980.
[TF]: https://www.tensorflow.org/install/
[FSNS]: https://github.com/tensorflow/models/tree/master/research/street
## How to use this code
To run all unit tests:
```
cd research/attention_ocr/python
find . -name "*_test.py" -printf '%P\n' | xargs python3 -m unittest
```
To train from scratch:
```
python train.py
```
To train a model using pre-trained Inception weights as initialization:
```
wget http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz
tar xf inception_v3_2016_08_28.tar.gz
python train.py --checkpoint_inception=./inception_v3.ckpt
```
To fine tune the Attention OCR model using a checkpoint:
```
wget http://download.tensorflow.org/models/attention_ocr_2017_08_09.tar.gz
tar xf attention_ocr_2017_08_09.tar.gz
python train.py --checkpoint=model.ckpt-399731
```
## How to use your own image data to train the model
You need to define a new dataset. There are two options:
1. Store data in the same format as the FSNS dataset and just reuse the
[python/datasets/fsns.py](https://github.com/tensorflow/models/blob/master/research/attention_ocr/python/datasets/fsns.py)
module. E.g., create a file datasets/newtextdataset.py:
```
import fsns
DEFAULT_DATASET_DIR = 'path/to/the/dataset'
DEFAULT_CONFIG = {
'name':
'MYDATASET',
'splits': {
'train': {
'size': 123,
'pattern': 'tfexample_train*'
},
'test': {
'size': 123,
'pattern': 'tfexample_test*'
}
},
'charset_filename':
'charset_size.txt',
'image_shape': (150, 600, 3),
'num_of_views':
4,
'max_sequence_length':
37,
'null_code':
42,
'items_to_descriptions': {
'image':
'A [150 x 600 x 3] color image.',
'label':
'Characters codes.',
'text':
'A unicode string.',
'length':
'A length of the encoded text.',
'num_of_views':
'A number of different views stored within the image.'
}
}
def get_split(split_name, dataset_dir=None, config=None):
if not dataset_dir:
dataset_dir = DEFAULT_DATASET_DIR
if not config:
config = DEFAULT_CONFIG
return fsns.get_split(split_name, dataset_dir, config)
```
You will also need to include it into the `datasets/__init__.py` and specify the
dataset name in the command line.
```
python train.py --dataset_name=newtextdataset
```
Please note that eval.py will also require the same flag.
To learn how to store a data in the FSNS
format please refer to the https://stackoverflow.com/a/44461910/743658.
2. Define a new dataset format. The model needs the following data to train:
- images: input images, shape [batch_size x H x W x 3];
- labels: ground truth label ids, shape=[batch_size x seq_length];
- labels_one_hot: labels in one-hot encoding, shape [batch_size x seq_length x num_char_classes];
Refer to [python/data_provider.py](https://github.com/tensorflow/models/blob/master/research/attention_ocr/python/data_provider.py#L33)
for more details. You can use [python/datasets/fsns.py](https://github.com/tensorflow/models/blob/master/research/attention_ocr/python/datasets/fsns.py)
as the example.
## How to use a pre-trained model
The inference part was not released yet, but it is pretty straightforward to
implement one in Python or C++.
The recommended way is to use the [Serving infrastructure][serving].
Alternatively you can:
1. define a placeholder for images (or use directly an numpy array)
2. [create a graph ](https://github.com/tensorflow/models/blob/master/research/attention_ocr/python/eval.py#L60)
```
endpoints = model.create_base(images_placeholder, labels_one_hot=None)
```
3. [load a pretrained model](https://github.com/tensorflow/models/blob/master/research/attention_ocr/python/model.py#L494)
4. run computations through the graph:
```
predictions = sess.run(endpoints.predicted_chars,
feed_dict={images_placeholder:images_actual_data})
```
5. Convert character IDs (predictions) to UTF8 using the provided charset file.
Please note that tensor names may change overtime and old stored checkpoints can
become unloadable. In many cases such backward incompatible changes can be
fixed with a [string substitution][1] to update the checkpoint itself or using a
custom var_list with [assign_from_checkpoint_fn][2]. For anything
other than a one time experiment please use the [TensorFlow Serving][serving].
[1]: https://github.com/tensorflow/tensorflow/blob/aaf7adc/tensorflow/contrib/rnn/python/tools/checkpoint_convert.py
[2]: https://www.tensorflow.org/api_docs/python/tf/contrib/framework/assign_from_checkpoint_fn
[serving]: https://tensorflow.github.io/serving/serving_basic
## Disclaimer
This code is a modified version of the internal model we used for our paper.
Currently it reaches 83.79% full sequence accuracy after 400k steps of training.
The main difference between this version and the version used in the paper - for
the paper we used a distributed training with 50 GPU (K80) workers (asynchronous
updates), the provided checkpoint was created using this code after ~6 days of
training on a single GPU (Titan X) (it reached 81% after 24 hours of training),
the coordinate encoding is disabled by default.
|