khhuang
/

chart-to-table

Image-Text-to-Text

vision-encoder-decoder

Inference Endpoints

Model card Files Files and versions Community

khuangaf commited on Dec 17, 2023

Commit

87feab4

·

1 Parent(s): ccd9308

update readme

Files changed (1) hide show

README.md +73 -0

README.md CHANGED Viewed

@@ -1,3 +1,76 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+language: en
 ---
+# The Chart-To-Table Model
+The Chart-To-Table model was introduced in the paper "Do LVLMs Understand Charts?
+Analyzing and Correcting Factual Errors in Chart Captioning" for converting a chart into a structured table. The generated tables use `&&&` to delimit rows and `|` to delimit columns. The underlying architecture of this model is UniChart.
+### How to use
+```python
+from transformers import DonutProcessor, VisionEncoderDecoderModel
+from PIL import Image
+model_name = "khhuang/chart-to-table"
+model = VisionEncoderDecoderModel.from_pretrained(model_name).cuda()
+processor = DonutProcessor.from_pretrained(model_name)
+image_path = "PATH_TO_IMAGE"
+def format_query(sentence):
+    return f"Does the image entails this statement: \"{sentence}\"?"
+# Format text inputs
+input_prompt = "<data_table_generation> <s_answer>"
+# Encode chart figure and tokenize text
+img = Image.open(IMAGE_PATH)
+pixel_values = processor(img.convert("RGB"), random_padding=False, return_tensors="pt").pixel_values
+pixel_values = pixel_values.cuda()
+decoder_input_ids = processor.tokenizer(input_prompt, add_special_tokens=False, return_tensors="pt", max_length=510).input_ids.cuda()#.squeeze(0)
+outputs = model.generate(
+        pixel_values.to(device),
+        decoder_input_ids=decoder_input_ids.to(device),
+        max_length=model.decoder.config.max_position_embeddings,
+        early_stopping=True,
+        pad_token_id=processor.tokenizer.pad_token_id,
+        eos_token_id=processor.tokenizer.eos_token_id,
+        use_cache=True,
+        num_beams=4,
+        bad_words_ids=[[processor.tokenizer.unk_token_id]],
+        return_dict_in_generate=True,
+    )
+sequence = processor.batch_decode(outputs.sequences)[0]
+sequence = sequence.replace(processor.tokenizer.eos_token, "").replace(processor.tokenizer.pad_token, "")
+extracted_table = sequence.split("<s_answer>")[1].strip()
+```
+### Citation
+```
+@misc{huang-etal-2023-do,
+    title = "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning",
+    author = "Huang, Kung-Hsiang  and
+      Zhou, Mingyang and
+      Chan, Hou Pong  and
+      Fung, Yi R. and
+      Wang, Zhenhailong and
+      Zhang, Lingyu and
+      Chang, Shih-Fu and
+      Ji, Heng",
+    year={2023},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+}
+```