File size: 2,286 Bytes
ccd9308 87feab4 ccd9308 87feab4 039dd31 87feab4 128d177 87feab4 128d177 87feab4 128d177 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
---
license: apache-2.0
language: en
---
# The Chart-To-Table Model
The Chart-To-Table model was introduced in the paper "Do LVLMs Understand Charts?
Analyzing and Correcting Factual Errors in Chart Captioning" for converting a chart into a structured table. The generated tables use `&&&` to delimit rows and `|` to delimit columns. The underlying architecture of this model is UniChart.
### How to use
```python
from transformers import DonutProcessor, VisionEncoderDecoderModel
from PIL import Image
model_name = "khhuang/chart-to-table"
model = VisionEncoderDecoderModel.from_pretrained(model_name).cuda()
processor = DonutProcessor.from_pretrained(model_name)
image_path = "PATH_TO_IMAGE"
# Format text inputs
input_prompt = "<data_table_generation> <s_answer>"
# Encode chart figure and tokenize text
img = Image.open(IMAGE_PATH)
pixel_values = processor(img.convert("RGB"), random_padding=False, return_tensors="pt").pixel_values
pixel_values = pixel_values.cuda()
decoder_input_ids = processor.tokenizer(input_prompt, add_special_tokens=False, return_tensors="pt", max_length=510).input_ids.cuda()#.squeeze(0)
outputs = model.generate(
pixel_values.cuda(),
decoder_input_ids=decoder_input_ids.cuda(),
max_length=model.decoder.config.max_position_embeddings,
early_stopping=True,
pad_token_id=processor.tokenizer.pad_token_id,
eos_token_id=processor.tokenizer.eos_token_id,
use_cache=True,
num_beams=4,
bad_words_ids=[[processor.tokenizer.unk_token_id]],
return_dict_in_generate=True,
)
sequence = processor.batch_decode(outputs.sequences)[0]
sequence = sequence.replace(processor.tokenizer.eos_token, "").replace(processor.tokenizer.pad_token, "")
extracted_table = sequence.split("<s_answer>")[1].strip()
```
### Citation
```bibtex
@misc{huang-etal-2023-do,
title = "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning",
author = "Huang, Kung-Hsiang and
Zhou, Mingyang and
Chan, Hou Pong and
Fung, Yi R. and
Wang, Zhenhailong and
Zhang, Lingyu and
Chang, Shih-Fu and
Ji, Heng",
year={2023},
eprint={2312.10160},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
``` |