Update README.md

ba8ca61 verified 4 months ago

5.73 kB

	---
	library_name: transformers
	license: mit
	datasets:
	- SpursgoZmy/MMTab
	- apoidea/pubtabnet-html
	language:
	- en
	base_model: google/pix2struct-base
	pipeline_tag: image-to-text
	---

	# pix2struct-base-table2html

	Turn table images into HTML!


	## Demo app

	Try the [demo app](https://huggingface.co/spaces/KennethTM/Table2html-table-detection-and-recognition) which contains both table detection and recognition!


	## About

	This model takes an image of a table and outputs HTML - the model parses the image and performs optical character recognition (OCR) and structure recognition to HTML format.

	The model expects an image containing only a table. If the table is embedded in a document, first use a table detection model to extract it (e.g. [Microsoft's Table Transformer model](https://huggingface.co/microsoft/table-transformer-detection)).

	The model is finetuned from [Pix2Struct base model](https://huggingface.co/google/pix2struct-base) using a max_patch_length of 1024 and max generation length of 1024. The max_patch_length should likely not be changed for inference but the generation length can be changed.

	The model has been trained using two datasets: [MMTab](https://huggingface.co/datasets/SpursgoZmy/MMTab) and [PubTabNet](https://huggingface.co/datasets/apoidea/pubtabnet-html).

	## Usage

	Below is a complete example of loading the model and performing inference on an example table image (example from the [MMTab dataset](https://huggingface.co/datasets/SpursgoZmy/MMTab)):

	```python
	import torch
	from transformers import AutoProcessor, Pix2StructForConditionalGeneration
	from PIL import Image
	import requests
	from io import BytesIO

	# Load model and processor
	device = "cuda" if torch.cuda.is_available() else "cpu"
	processor = AutoProcessor.from_pretrained("KennethTM/pix2struct-base-table2html")
	model = Pix2StructForConditionalGeneration.from_pretrained("KennethTM/pix2struct-base-table2html")
	model.to(device)
	model.eval()

	# Load example image from URL
	url = "https://huggingface.co/KennethTM/pix2struct-base-table2html/resolve/main/example_recog_1.jpg"
	response = requests.get(url)
	image = Image.open(BytesIO(response.content))

	# Run model inference
	encoding = processor(image, return_tensors="pt", max_patches=1024)
	with torch.inference_mode():
	flattened_patches = encoding.pop("flattened_patches").to(device)
	attention_mask = encoding.pop("attention_mask").to(device)
	predictions = model.generate(flattened_patches=flattened_patches, attention_mask=attention_mask, max_new_tokens=1024)

	predictions_decoded = processor.tokenizer.batch_decode(predictions, skip_special_tokens=True)

	# Show predictions as text
	print(predictions_decoded[0])
	```

	Example image:

	![](https://huggingface.co/KennethTM/pix2struct-base-table2html/resolve/main/example_recog_1.jpg)

	Model HTML output for example image:

	```html
	<table border="1" cellspacing="0">
	<tr>
	<th>
	Rank
	</th>
	<th>
	Lane
	</th>
	<th>
	Name
	</th>
	<th>
	Nationality
	</th>
	<th>
	Time
	</th>
	<th>
	Notes
	</th>
	</tr>
	<tr>
	<td>
	</td>
	<td>
	4
	</td>
	<td>
	Michael Phelps
	</td>
	<td>
	United States
	</td>
	<td>
	51.25
	</td>
	<td>
	OR
	</td>
	</tr>
	<tr>
	<td>
	</td>
	<td>
	3
	</td>
	<td>
	Ian Crocker
	</td>
	<td>
	United States
	</td>
	<td>
	51.29
	</td>
	<td>
	</td>
	</tr>
	<tr>
	<td>
	</td>
	<td>
	5
	</td>
	<td>
	Andriy Serdinov
	</td>
	<td>
	Ukraine
	</td>
	<td>
	51.36
	</td>
	<td>
	EU
	</td>
	</tr>
	<tr>
	<td>
	4
	</td>
	<td>
	1
	</td>
	<td>
	Thomas Rupprath
	</td>
	<td>
	Germany
	</td>
	<td>
	52.27
	</td>
	<td>
	</td>
	</tr>
	<tr>
	<td>
	5
	</td>
	<td>
	6
	</td>
	<td>
	Igor Marchenko
	</td>
	<td>
	Russia
	</td>
	<td>
	52.32
	</td>
	<td>
	</td>
	</tr>
	<tr>
	<td>
	6
	</td>
	<td>
	2
	</td>
	<td>
	Gabriel Mangabeira
	</td>
	<td>
	Brazil
	</td>
	<td>
	52.34
	</td>
	<td>
	</td>
	</tr>
	<tr>
	<td>
	7
	</td>
	<td>
	8
	</td>
	<td>
	Duje Draganja
	</td>
	<td>
	Croatia
	</td>
	<td>
	52.46
	</td>
	<td>
	</td>
	</tr>
	<tr>
	<td>
	8
	</td>
	<td>
	7
	</td>
	<td>
	Geoff Huegill
	</td>
	<td>
	Australia
	</td>
	<td>
	52.56
	</td>
	<td>
	</td>
	</tr>
	</table>
	```

	And the rendered HTML table:

	<table border="1" cellspacing="0">
	<tr>
	<th>
	Rank
	</th>
	<th>
	Lane
	</th>
	<th>
	Name
	</th>
	<th>
	Nationality
	</th>
	<th>
	Time
	</th>
	<th>
	Notes
	</th>
	</tr>
	<tr>
	<td>
	</td>
	<td>
	4
	</td>
	<td>
	Michael Phelps
	</td>
	<td>
	United States
	</td>
	<td>
	51.25
	</td>
	<td>
	OR
	</td>
	</tr>
	<tr>
	<td>
	</td>
	<td>
	3
	</td>
	<td>
	Ian Crocker
	</td>
	<td>
	United States
	</td>
	<td>
	51.29
	</td>
	<td>
	</td>
	</tr>
	<tr>
	<td>
	</td>
	<td>
	5
	</td>
	<td>
	Andriy Serdinov
	</td>
	<td>
	Ukraine
	</td>
	<td>
	51.36
	</td>
	<td>
	EU
	</td>
	</tr>
	<tr>
	<td>
	4
	</td>
	<td>
	1
	</td>
	<td>
	Thomas Rupprath
	</td>
	<td>
	Germany
	</td>
	<td>
	52.27
	</td>
	<td>
	</td>
	</tr>
	<tr>
	<td>
	5
	</td>
	<td>
	6
	</td>
	<td>
	Igor Marchenko
	</td>
	<td>
	Russia
	</td>
	<td>
	52.32
	</td>
	<td>
	</td>
	</tr>
	<tr>
	<td>
	6
	</td>
	<td>
	2
	</td>
	<td>
	Gabriel Mangabeira
	</td>
	<td>
	Brazil
	</td>
	<td>
	52.34
	</td>
	<td>
	</td>
	</tr>
	<tr>
	<td>
	7
	</td>
	<td>
	8
	</td>
	<td>
	Duje Draganja
	</td>
	<td>
	Croatia
	</td>
	<td>
	52.46
	</td>
	<td>
	</td>
	</tr>
	<tr>
	<td>
	8
	</td>
	<td>
	7
	</td>
	<td>
	Geoff Huegill
	</td>
	<td>
	Australia
	</td>
	<td>
	52.56
	</td>
	<td>
	</td>
	</tr>
	</table>