mobenta commited on
Commit
1fbcdfe
·
verified ·
1 Parent(s): 7bd1859

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -75
README.md CHANGED
@@ -1,75 +0,0 @@
1
- ---
2
- license: gpl-3.0
3
- language:
4
- - en
5
- ---
6
-
7
- # ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild
8
-
9
-
10
- Paper Link: https://arxiv.org/abs/2407.04172
11
-
12
- The abstract of the paper states that:
13
- > Given the ubiquity of charts as a data analysis, visualization, and decision-making tool across industries and sciences, there has been a growing interest in developing pre-trained foundation models as well as general purpose instruction-tuned models for chart understanding and reasoning. However, existing methods suffer crucial drawbacks across two critical axes affecting the performance of chart representation models: they are trained on data generated from underlying data tables of the charts, ignoring the visual trends and patterns in chart images, \emph{and} use weakly aligned vision-language backbone models for domain-specific training, limiting their generalizability when encountering charts in the wild. We address these important drawbacks and introduce ChartGemma, a novel chart understanding and reasoning model developed over PaliGemma. Rather than relying on underlying data tables, ChartGemma is trained on instruction-tuning data generated directly from chart images, thus capturing both high-level trends and low-level visual information from a diverse set of charts. Our simple approach achieves state-of-the-art results across $5$ benchmarks spanning chart summarization, question answering, and fact-checking, and our elaborate qualitative studies on real-world charts show that ChartGemma generates more realistic and factually correct summaries compared to its contemporaries.
14
- # Web Demo
15
- If you wish to quickly try our model, you can access our public web demo hosted on the Hugging Face Spaces platform with a friendly interface!
16
-
17
- [ChartGemma Web Demo](https://huggingface.co/spaces/ahmed-masry/ChartGemma)
18
-
19
- # Inference
20
- You can easily use our models for inference with the huggingface library!
21
- You just need to do the following:
22
- 1. Chage the **_image_path_** to your chart example image path on your system
23
- 2. Write the **_input_text_**
24
-
25
- We recommend using beam search with a beam size of 4, but if your machine has low memory, you can remove the num_beams from the generate method.
26
- ```
27
- from PIL import Image
28
- import requests
29
- from transformers import AutoProcessor, PaliGemmaForConditionalGeneration
30
- import torch
31
-
32
- torch.hub.download_url_to_file('https://raw.githubusercontent.com/vis-nlp/ChartQA/main/ChartQA%20Dataset/val/png/multi_col_1229.png', 'chart_example_1.png')
33
-
34
- image_path = "/content/chart_example_1.png"
35
- input_text ="program of thought: what is the sum of Faceboob Messnger and Whatsapp values in the 18-29 age group?"
36
-
37
- # Load Model
38
- model = PaliGemmaForConditionalGeneration.from_pretrained("ahmed-masry/chartgemma", torch_dtype=torch.float16)
39
- processor = AutoProcessor.from_pretrained("ahmed-masry/chartgemma")
40
-
41
- device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
42
- model = model.to(device)
43
-
44
- # Process Inputs
45
- image = Image.open(image_path).convert('RGB')
46
- inputs = processor(text=input_text, images=image, return_tensors="pt")
47
- prompt_length = inputs['input_ids'].shape[1]
48
- inputs = {k: v.to(device) for k, v in inputs.items()}
49
-
50
-
51
- # Generate
52
- generate_ids = model.generate(**inputs, num_beams=4, max_new_tokens=512)
53
- output_text = processor.batch_decode(generate_ids[:, prompt_length:], skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
54
- print(output_text)
55
-
56
- ```
57
-
58
- # Contact
59
- If you have any questions about this work, please contact **[Ahmed Masry](https://ahmedmasryku.github.io/)** using the following email addresses: **[email protected]** or **[email protected]**.
60
-
61
- # Reference
62
- Please cite our paper if you use our model in your research.
63
-
64
- ```
65
- @misc{masry2024chartgemmavisualinstructiontuningchart,
66
- title={ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild},
67
- author={Ahmed Masry and Megh Thakkar and Aayush Bajaj and Aaryaman Kartha and Enamul Hoque and Shafiq Joty},
68
- year={2024},
69
- eprint={2407.04172},
70
- archivePrefix={arXiv},
71
- primaryClass={cs.AI},
72
- url={https://arxiv.org/abs/2407.04172},
73
- }
74
-
75
- ```