|
--- |
|
library_name: transformers |
|
license: mit |
|
datasets: |
|
- SpursgoZmy/MMTab |
|
- apoidea/pubtabnet-html |
|
language: |
|
- en |
|
base_model: google/pix2struct-base |
|
pipeline_tag: image-to-text |
|
--- |
|
|
|
# pix2struct-base-table2html |
|
|
|
*Turn table images into HTML!* |
|
|
|
|
|
## Demo app |
|
|
|
Try the [demo app](https://huggingface.co/spaces/KennethTM/Table2html-table-detection-and-recognition) which contains both table detection and recognition! |
|
|
|
|
|
## About |
|
|
|
This model takes an image of a table and outputs HTML - the model parses the image and performs optical character recognition (OCR) and structure recognition to HTML format. |
|
|
|
The model expects an image containing only a table. If the table is embedded in a document, first use a table detection model to extract it (e.g. [Microsoft's Table Transformer model](https://huggingface.co/microsoft/table-transformer-detection)). |
|
|
|
The model is finetuned from [Pix2Struct base model](https://huggingface.co/google/pix2struct-base) using a max_patch_length of 1024 and max generation length of 1024. The max_patch_length should likely not be changed for inference but the generation length can be changed. |
|
|
|
The model has been trained using two datasets: [MMTab](https://huggingface.co/datasets/SpursgoZmy/MMTab) and [PubTabNet](https://huggingface.co/datasets/apoidea/pubtabnet-html). |
|
|
|
## Usage |
|
|
|
Below is a complete example of loading the model and performing inference on an example table image (example from the [MMTab dataset](https://huggingface.co/datasets/SpursgoZmy/MMTab)): |
|
|
|
```python |
|
import torch |
|
from transformers import AutoProcessor, Pix2StructForConditionalGeneration |
|
from PIL import Image |
|
import requests |
|
from io import BytesIO |
|
|
|
# Load model and processor |
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
processor = AutoProcessor.from_pretrained("KennethTM/pix2struct-base-table2html") |
|
model = Pix2StructForConditionalGeneration.from_pretrained("KennethTM/pix2struct-base-table2html") |
|
model.to(device) |
|
model.eval() |
|
|
|
# Load example image from URL |
|
url = "https://huggingface.co/KennethTM/pix2struct-base-table2html/resolve/main/example_recog_1.jpg" |
|
response = requests.get(url) |
|
image = Image.open(BytesIO(response.content)) |
|
|
|
# Run model inference |
|
encoding = processor(image, return_tensors="pt", max_patches=1024) |
|
with torch.inference_mode(): |
|
flattened_patches = encoding.pop("flattened_patches").to(device) |
|
attention_mask = encoding.pop("attention_mask").to(device) |
|
predictions = model.generate(flattened_patches=flattened_patches, attention_mask=attention_mask, max_new_tokens=1024) |
|
|
|
predictions_decoded = processor.tokenizer.batch_decode(predictions, skip_special_tokens=True) |
|
|
|
# Show predictions as text |
|
print(predictions_decoded[0]) |
|
``` |
|
|
|
Example image: |
|
|
|
![](https://huggingface.co/KennethTM/pix2struct-base-table2html/resolve/main/example_recog_1.jpg) |
|
|
|
Model HTML output for example image: |
|
|
|
```html |
|
<table border="1" cellspacing="0"> |
|
<tr> |
|
<th> |
|
Rank |
|
</th> |
|
<th> |
|
Lane |
|
</th> |
|
<th> |
|
Name |
|
</th> |
|
<th> |
|
Nationality |
|
</th> |
|
<th> |
|
Time |
|
</th> |
|
<th> |
|
Notes |
|
</th> |
|
</tr> |
|
<tr> |
|
<td> |
|
</td> |
|
<td> |
|
4 |
|
</td> |
|
<td> |
|
Michael Phelps |
|
</td> |
|
<td> |
|
United States |
|
</td> |
|
<td> |
|
51.25 |
|
</td> |
|
<td> |
|
OR |
|
</td> |
|
</tr> |
|
<tr> |
|
<td> |
|
</td> |
|
<td> |
|
3 |
|
</td> |
|
<td> |
|
Ian Crocker |
|
</td> |
|
<td> |
|
United States |
|
</td> |
|
<td> |
|
51.29 |
|
</td> |
|
<td> |
|
</td> |
|
</tr> |
|
<tr> |
|
<td> |
|
</td> |
|
<td> |
|
5 |
|
</td> |
|
<td> |
|
Andriy Serdinov |
|
</td> |
|
<td> |
|
Ukraine |
|
</td> |
|
<td> |
|
51.36 |
|
</td> |
|
<td> |
|
EU |
|
</td> |
|
</tr> |
|
<tr> |
|
<td> |
|
4 |
|
</td> |
|
<td> |
|
1 |
|
</td> |
|
<td> |
|
Thomas Rupprath |
|
</td> |
|
<td> |
|
Germany |
|
</td> |
|
<td> |
|
52.27 |
|
</td> |
|
<td> |
|
</td> |
|
</tr> |
|
<tr> |
|
<td> |
|
5 |
|
</td> |
|
<td> |
|
6 |
|
</td> |
|
<td> |
|
Igor Marchenko |
|
</td> |
|
<td> |
|
Russia |
|
</td> |
|
<td> |
|
52.32 |
|
</td> |
|
<td> |
|
</td> |
|
</tr> |
|
<tr> |
|
<td> |
|
6 |
|
</td> |
|
<td> |
|
2 |
|
</td> |
|
<td> |
|
Gabriel Mangabeira |
|
</td> |
|
<td> |
|
Brazil |
|
</td> |
|
<td> |
|
52.34 |
|
</td> |
|
<td> |
|
</td> |
|
</tr> |
|
<tr> |
|
<td> |
|
7 |
|
</td> |
|
<td> |
|
8 |
|
</td> |
|
<td> |
|
Duje Draganja |
|
</td> |
|
<td> |
|
Croatia |
|
</td> |
|
<td> |
|
52.46 |
|
</td> |
|
<td> |
|
</td> |
|
</tr> |
|
<tr> |
|
<td> |
|
8 |
|
</td> |
|
<td> |
|
7 |
|
</td> |
|
<td> |
|
Geoff Huegill |
|
</td> |
|
<td> |
|
Australia |
|
</td> |
|
<td> |
|
52.56 |
|
</td> |
|
<td> |
|
</td> |
|
</tr> |
|
</table> |
|
``` |
|
|
|
And the rendered HTML table: |
|
|
|
<table border="1" cellspacing="0"> |
|
<tr> |
|
<th> |
|
Rank |
|
</th> |
|
<th> |
|
Lane |
|
</th> |
|
<th> |
|
Name |
|
</th> |
|
<th> |
|
Nationality |
|
</th> |
|
<th> |
|
Time |
|
</th> |
|
<th> |
|
Notes |
|
</th> |
|
</tr> |
|
<tr> |
|
<td> |
|
</td> |
|
<td> |
|
4 |
|
</td> |
|
<td> |
|
Michael Phelps |
|
</td> |
|
<td> |
|
United States |
|
</td> |
|
<td> |
|
51.25 |
|
</td> |
|
<td> |
|
OR |
|
</td> |
|
</tr> |
|
<tr> |
|
<td> |
|
</td> |
|
<td> |
|
3 |
|
</td> |
|
<td> |
|
Ian Crocker |
|
</td> |
|
<td> |
|
United States |
|
</td> |
|
<td> |
|
51.29 |
|
</td> |
|
<td> |
|
</td> |
|
</tr> |
|
<tr> |
|
<td> |
|
</td> |
|
<td> |
|
5 |
|
</td> |
|
<td> |
|
Andriy Serdinov |
|
</td> |
|
<td> |
|
Ukraine |
|
</td> |
|
<td> |
|
51.36 |
|
</td> |
|
<td> |
|
EU |
|
</td> |
|
</tr> |
|
<tr> |
|
<td> |
|
4 |
|
</td> |
|
<td> |
|
1 |
|
</td> |
|
<td> |
|
Thomas Rupprath |
|
</td> |
|
<td> |
|
Germany |
|
</td> |
|
<td> |
|
52.27 |
|
</td> |
|
<td> |
|
</td> |
|
</tr> |
|
<tr> |
|
<td> |
|
5 |
|
</td> |
|
<td> |
|
6 |
|
</td> |
|
<td> |
|
Igor Marchenko |
|
</td> |
|
<td> |
|
Russia |
|
</td> |
|
<td> |
|
52.32 |
|
</td> |
|
<td> |
|
</td> |
|
</tr> |
|
<tr> |
|
<td> |
|
6 |
|
</td> |
|
<td> |
|
2 |
|
</td> |
|
<td> |
|
Gabriel Mangabeira |
|
</td> |
|
<td> |
|
Brazil |
|
</td> |
|
<td> |
|
52.34 |
|
</td> |
|
<td> |
|
</td> |
|
</tr> |
|
<tr> |
|
<td> |
|
7 |
|
</td> |
|
<td> |
|
8 |
|
</td> |
|
<td> |
|
Duje Draganja |
|
</td> |
|
<td> |
|
Croatia |
|
</td> |
|
<td> |
|
52.46 |
|
</td> |
|
<td> |
|
</td> |
|
</tr> |
|
<tr> |
|
<td> |
|
8 |
|
</td> |
|
<td> |
|
7 |
|
</td> |
|
<td> |
|
Geoff Huegill |
|
</td> |
|
<td> |
|
Australia |
|
</td> |
|
<td> |
|
52.56 |
|
</td> |
|
<td> |
|
</td> |
|
</tr> |
|
</table> |