sakharamg
/

NMTKD

Model card Files Files and versions Community

NMTKD / translation /OpenNMT-py /docs /source /legacy /im2text.md

sakharamg

Uploading all files

158b61b about 2 years ago

preview code

raw

history blame contribute delete

3.44 kB

	# Image to Text

	---------

	WARNING: This example is based on the [legacy version of OpenNMT-py](https://github.com/OpenNMT/OpenNMT-py/tree/legacy)!

	---------

	A deep learning-based approach to learning the image-to-text conversion, built on top of the <a href="http://opennmt.net/">OpenNMT</a> system. It is completely data-driven, hence can be used for a variety of image-to-text problems, such as image captioning, optical character recognition and LaTeX decompilation.

	Take LaTeX decompilation as an example, given a formula image:

	<p align="center"><img src="http://lstm.seas.harvard.edu/latex/results/website/images/119b93a445-orig.png"></p>

	The goal is to infer the LaTeX source that can be compiled to such an image:

	```
	d s _ { 1 1 } ^ { 2 } = d x ^ { + } d x ^ { - } + l _ { p } ^ { 9 } \frac { p _ { - } } { r ^ { 7 } } \delta ( x ^ { - } ) d x ^ { - } d x ^ { - } + d x _ { 1 } ^ { 2 } + \; \cdots \; + d x _ { 9 } ^ { 2 }
	```

	The paper [[What You Get Is What You See: A Visual Markup Decompiler]](https://arxiv.org/pdf/1609.04938.pdf) provides more technical details of this model.

	### Dependencies

	* `torchvision`: `conda install torchvision`
	* `Pillow`: `pip install Pillow`

	### Quick Start

	To get started, we provide a toy Math-to-LaTex example. We assume that the working directory is `OpenNMT-py` throughout this document.

	Im2Text consists of four commands:

	0) Download the data.

	```bash
	wget -O data/im2text.tgz http://lstm.seas.harvard.edu/latex/im2text_small.tgz; tar zxf data/im2text.tgz -C data/
	```

	1) Preprocess the data.

	```bash
	onmt_preprocess -data_type img \
	-src_dir data/im2text/images/ \
	-train_src data/im2text/src-train.txt \
	-train_tgt data/im2text/tgt-train.txt -valid_src data/im2text/src-val.txt \
	-valid_tgt data/im2text/tgt-val.txt -save_data data/im2text/demo \
	-tgt_seq_length 150 \
	-tgt_words_min_frequency 2 \
	-shard_size 500 \
	-image_channel_size 1
	```

	2) Train the model.

	```bash
	onmt_train -model_type img \
	-data data/im2text/demo \
	-save_model demo-model \
	-gpu_ranks 0 \
	-batch_size 20 \
	-max_grad_norm 20 \
	-learning_rate 0.1 \
	-word_vec_size 80 \
	-encoder_type brnn \
	-image_channel_size 1
	```

	3) Translate the images.

	```bash
	onmt_translate -data_type img \
	-model demo-model_acc_x_ppl_x_e13.pt \
	-src_dir data/im2text/images \
	-src data/im2text/src-test.txt \
	-output pred.txt \
	-max_length 150 \
	-beam_size 5 \
	-gpu 0 \
	-verbose
	```

	The above dataset is sampled from the [im2latex-100k-dataset](http://lstm.seas.harvard.edu/latex/im2text.tgz). We provide a trained model [[link]](http://lstm.seas.harvard.edu/latex/py-model.pt) on this dataset.

	### Options

	* `-src_dir`: The directory containing the images.

	* `-train_tgt`: The file storing the tokenized labels, one label per line. It shall look like:
	```
	<label0_token0> <label0_token1> ... <label0_tokenN0>
	<label1_token0> <label1_token1> ... <label1_tokenN1>
	<label2_token0> <label2_token1> ... <label2_tokenN2>
	...
	```

	* `-train_src`: The file storing the paths of the images (relative to `src_dir`).
	```
	<image0_path>
	<image1_path>
	<image2_path>
	...
	```