|
## Attention-based Extraction of Structured Information from Street View Imagery |
|
|
|
[](https://paperswithcode.com/sota/optical-character-recognition-on-fsns-test?p=attention-based-extraction-of-structured) |
|
[](https://arxiv.org/abs/1704.03549) |
|
[](https://github.com/tensorflow/tensorflow/releases/tag/v1.15.0) |
|
|
|
*A TensorFlow model for real-world image text extraction problems.* |
|
|
|
This folder contains the code needed to train a new Attention OCR model on the |
|
[FSNS dataset][FSNS] dataset to transcribe street names in France. You can |
|
also use it to train it on your own data. |
|
|
|
More details can be found in our paper: |
|
|
|
["Attention-based Extraction of Structured Information from Street View |
|
Imagery"](https://arxiv.org/abs/1704.03549) |
|
|
|
## Contacts |
|
|
|
Authors |
|
|
|
* Zbigniew Wojna ([email protected]) |
|
* Alexander Gorban ([email protected]) |
|
|
|
Maintainer: Xavier Gibert [@xavigibert](https://github.com/xavigibert) |
|
|
|
## Requirements |
|
|
|
1. Install the TensorFlow library ([instructions][TF]). For example: |
|
|
|
``` |
|
python3 -m venv ~/.tensorflow |
|
source ~/.tensorflow/bin/activate |
|
pip install --upgrade pip |
|
pip install --upgrade tensorflow-gpu=1.15 |
|
``` |
|
|
|
2. At least 158GB of free disk space to download the FSNS dataset: |
|
|
|
``` |
|
cd research/attention_ocr/python/datasets |
|
aria2c -c -j 20 -i ../../../street/python/fsns_urls.txt |
|
cd .. |
|
``` |
|
|
|
3. 16GB of RAM or more; 32GB is recommended. |
|
4. `train.py` works with both CPU and GPU, though using GPU is preferable. It has been tested with a Titan X and with a GTX980. |
|
|
|
[TF]: https://www.tensorflow.org/install/ |
|
[FSNS]: https://github.com/tensorflow/models/tree/master/research/street |
|
|
|
## How to use this code |
|
|
|
To run all unit tests: |
|
|
|
``` |
|
cd research/attention_ocr/python |
|
find . -name "*_test.py" -printf '%P\n' | xargs python3 -m unittest |
|
``` |
|
|
|
To train from scratch: |
|
|
|
``` |
|
python train.py |
|
``` |
|
|
|
To train a model using pre-trained Inception weights as initialization: |
|
|
|
``` |
|
wget http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz |
|
tar xf inception_v3_2016_08_28.tar.gz |
|
python train.py --checkpoint_inception=./inception_v3.ckpt |
|
``` |
|
|
|
To fine tune the Attention OCR model using a checkpoint: |
|
|
|
``` |
|
wget http://download.tensorflow.org/models/attention_ocr_2017_08_09.tar.gz |
|
tar xf attention_ocr_2017_08_09.tar.gz |
|
python train.py --checkpoint=model.ckpt-399731 |
|
``` |
|
|
|
## How to use your own image data to train the model |
|
|
|
You need to define a new dataset. There are two options: |
|
|
|
1. Store data in the same format as the FSNS dataset and just reuse the |
|
[python/datasets/fsns.py](https://github.com/tensorflow/models/blob/master/research/attention_ocr/python/datasets/fsns.py) |
|
module. E.g., create a file datasets/newtextdataset.py: |
|
``` |
|
import fsns |
|
|
|
DEFAULT_DATASET_DIR = 'path/to/the/dataset' |
|
|
|
DEFAULT_CONFIG = { |
|
'name': |
|
'MYDATASET', |
|
'splits': { |
|
'train': { |
|
'size': 123, |
|
'pattern': 'tfexample_train*' |
|
}, |
|
'test': { |
|
'size': 123, |
|
'pattern': 'tfexample_test*' |
|
} |
|
}, |
|
'charset_filename': |
|
'charset_size.txt', |
|
'image_shape': (150, 600, 3), |
|
'num_of_views': |
|
4, |
|
'max_sequence_length': |
|
37, |
|
'null_code': |
|
42, |
|
'items_to_descriptions': { |
|
'image': |
|
'A [150 x 600 x 3] color image.', |
|
'label': |
|
'Characters codes.', |
|
'text': |
|
'A unicode string.', |
|
'length': |
|
'A length of the encoded text.', |
|
'num_of_views': |
|
'A number of different views stored within the image.' |
|
} |
|
} |
|
|
|
|
|
def get_split(split_name, dataset_dir=None, config=None): |
|
if not dataset_dir: |
|
dataset_dir = DEFAULT_DATASET_DIR |
|
if not config: |
|
config = DEFAULT_CONFIG |
|
|
|
return fsns.get_split(split_name, dataset_dir, config) |
|
``` |
|
You will also need to include it into the `datasets/__init__.py` and specify the |
|
dataset name in the command line. |
|
|
|
``` |
|
python train.py --dataset_name=newtextdataset |
|
``` |
|
|
|
Please note that eval.py will also require the same flag. |
|
|
|
To learn how to store a data in the FSNS |
|
format please refer to the https://stackoverflow.com/a/44461910/743658. |
|
|
|
2. Define a new dataset format. The model needs the following data to train: |
|
|
|
- images: input images, shape [batch_size x H x W x 3]; |
|
- labels: ground truth label ids, shape=[batch_size x seq_length]; |
|
- labels_one_hot: labels in one-hot encoding, shape [batch_size x seq_length x num_char_classes]; |
|
|
|
Refer to [python/data_provider.py](https://github.com/tensorflow/models/blob/master/research/attention_ocr/python/data_provider.py#L33) |
|
for more details. You can use [python/datasets/fsns.py](https://github.com/tensorflow/models/blob/master/research/attention_ocr/python/datasets/fsns.py) |
|
as the example. |
|
|
|
## How to use a pre-trained model |
|
|
|
The inference part was not released yet, but it is pretty straightforward to |
|
implement one in Python or C++. |
|
|
|
The recommended way is to use the [Serving infrastructure][serving]. |
|
|
|
Alternatively you can: |
|
1. define a placeholder for images (or use directly an numpy array) |
|
2. [create a graph ](https://github.com/tensorflow/models/blob/master/research/attention_ocr/python/eval.py#L60) |
|
``` |
|
endpoints = model.create_base(images_placeholder, labels_one_hot=None) |
|
``` |
|
3. [load a pretrained model](https://github.com/tensorflow/models/blob/master/research/attention_ocr/python/model.py#L494) |
|
4. run computations through the graph: |
|
``` |
|
predictions = sess.run(endpoints.predicted_chars, |
|
feed_dict={images_placeholder:images_actual_data}) |
|
``` |
|
5. Convert character IDs (predictions) to UTF8 using the provided charset file. |
|
|
|
Please note that tensor names may change overtime and old stored checkpoints can |
|
become unloadable. In many cases such backward incompatible changes can be |
|
fixed with a [string substitution][1] to update the checkpoint itself or using a |
|
custom var_list with [assign_from_checkpoint_fn][2]. For anything |
|
other than a one time experiment please use the [TensorFlow Serving][serving]. |
|
|
|
[1]: https://github.com/tensorflow/tensorflow/blob/aaf7adc/tensorflow/contrib/rnn/python/tools/checkpoint_convert.py |
|
[2]: https://www.tensorflow.org/api_docs/python/tf/contrib/framework/assign_from_checkpoint_fn |
|
[serving]: https://tensorflow.github.io/serving/serving_basic |
|
|
|
## Disclaimer |
|
|
|
This code is a modified version of the internal model we used for our paper. |
|
Currently it reaches 83.79% full sequence accuracy after 400k steps of training. |
|
The main difference between this version and the version used in the paper - for |
|
the paper we used a distributed training with 50 GPU (K80) workers (asynchronous |
|
updates), the provided checkpoint was created using this code after ~6 days of |
|
training on a single GPU (Titan X) (it reached 81% after 24 hours of training), |
|
the coordinate encoding is disabled by default. |
|
|