khuangaf commited on
Commit
87feab4
·
1 Parent(s): ccd9308

update readme

Browse files
Files changed (1) hide show
  1. README.md +73 -0
README.md CHANGED
@@ -1,3 +1,76 @@
1
  ---
2
  license: apache-2.0
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language: en
4
  ---
5
+
6
+
7
+ # The Chart-To-Table Model
8
+
9
+ The Chart-To-Table model was introduced in the paper "Do LVLMs Understand Charts?
10
+ Analyzing and Correcting Factual Errors in Chart Captioning" for converting a chart into a structured table. The generated tables use `&&&` to delimit rows and `|` to delimit columns. The underlying architecture of this model is UniChart.
11
+
12
+
13
+
14
+ ### How to use
15
+
16
+
17
+ ```python
18
+ from transformers import DonutProcessor, VisionEncoderDecoderModel
19
+ from PIL import Image
20
+
21
+ model_name = "khhuang/chart-to-table"
22
+ model = VisionEncoderDecoderModel.from_pretrained(model_name).cuda()
23
+ processor = DonutProcessor.from_pretrained(model_name)
24
+
25
+ image_path = "PATH_TO_IMAGE"
26
+
27
+ def format_query(sentence):
28
+ return f"Does the image entails this statement: \"{sentence}\"?"
29
+
30
+ # Format text inputs
31
+
32
+ input_prompt = "<data_table_generation> <s_answer>"
33
+
34
+ # Encode chart figure and tokenize text
35
+ img = Image.open(IMAGE_PATH)
36
+ pixel_values = processor(img.convert("RGB"), random_padding=False, return_tensors="pt").pixel_values
37
+ pixel_values = pixel_values.cuda()
38
+ decoder_input_ids = processor.tokenizer(input_prompt, add_special_tokens=False, return_tensors="pt", max_length=510).input_ids.cuda()#.squeeze(0)
39
+
40
+
41
+ outputs = model.generate(
42
+ pixel_values.to(device),
43
+ decoder_input_ids=decoder_input_ids.to(device),
44
+ max_length=model.decoder.config.max_position_embeddings,
45
+ early_stopping=True,
46
+ pad_token_id=processor.tokenizer.pad_token_id,
47
+ eos_token_id=processor.tokenizer.eos_token_id,
48
+ use_cache=True,
49
+ num_beams=4,
50
+ bad_words_ids=[[processor.tokenizer.unk_token_id]],
51
+ return_dict_in_generate=True,
52
+ )
53
+
54
+
55
+ sequence = processor.batch_decode(outputs.sequences)[0]
56
+ sequence = sequence.replace(processor.tokenizer.eos_token, "").replace(processor.tokenizer.pad_token, "")
57
+ extracted_table = sequence.split("<s_answer>")[1].strip()
58
+ ```
59
+
60
+ ### Citation
61
+ ```
62
+ @misc{huang-etal-2023-do,
63
+ title = "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning",
64
+ author = "Huang, Kung-Hsiang and
65
+ Zhou, Mingyang and
66
+ Chan, Hou Pong and
67
+ Fung, Yi R. and
68
+ Wang, Zhenhailong and
69
+ Zhang, Lingyu and
70
+ Chang, Shih-Fu and
71
+ Ji, Heng",
72
+ year={2023},
73
+ archivePrefix={arXiv},
74
+ primaryClass={cs.CL}
75
+ }
76
+ ```