ahmed-masry commited on
Commit
fcaa162
·
verified ·
1 Parent(s): 766d3f9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -1
README.md CHANGED
@@ -7,7 +7,7 @@ language:
7
  # ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild
8
 
9
 
10
- Paper Link:
11
 
12
  The abstract of the paper states that:
13
  > Given the ubiquity of charts as a data analysis, visualization, and decision-making tool across industries and sciences, there has been a growing interest in developing pre-trained foundation models as well as general purpose instruction-tuned models for chart understanding and reasoning. However, existing methods suffer crucial drawbacks across two critical axes affecting the performance of chart representation models: they are trained on data generated from underlying data tables of the charts, ignoring the visual trends and patterns in chart images, \emph{and} use weakly aligned vision-language backbone models for domain-specific training, limiting their generalizability when encountering charts in the wild. We address these important drawbacks and introduce ChartGemma, a novel chart understanding and reasoning model developed over PaliGemma. Rather than relying on underlying data tables, ChartGemma is trained on instruction-tuning data generated directly from chart images, thus capturing both high-level trends and low-level visual information from a diverse set of charts. Our simple approach achieves state-of-the-art results across $5$ benchmarks spanning chart summarization, question answering, and fact-checking, and our elaborate qualitative studies on real-world charts show that ChartGemma generates more realistic and factually correct summaries compared to its contemporaries.
@@ -62,5 +62,14 @@ If you have any questions about this work, please contact **[Ahmed Masry](https:
62
  Please cite our paper if you use our model in your research.
63
 
64
  ```
 
 
 
 
 
 
 
 
 
65
 
66
  ```
 
7
  # ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild
8
 
9
 
10
+ Paper Link: https://arxiv.org/abs/2407.04172
11
 
12
  The abstract of the paper states that:
13
  > Given the ubiquity of charts as a data analysis, visualization, and decision-making tool across industries and sciences, there has been a growing interest in developing pre-trained foundation models as well as general purpose instruction-tuned models for chart understanding and reasoning. However, existing methods suffer crucial drawbacks across two critical axes affecting the performance of chart representation models: they are trained on data generated from underlying data tables of the charts, ignoring the visual trends and patterns in chart images, \emph{and} use weakly aligned vision-language backbone models for domain-specific training, limiting their generalizability when encountering charts in the wild. We address these important drawbacks and introduce ChartGemma, a novel chart understanding and reasoning model developed over PaliGemma. Rather than relying on underlying data tables, ChartGemma is trained on instruction-tuning data generated directly from chart images, thus capturing both high-level trends and low-level visual information from a diverse set of charts. Our simple approach achieves state-of-the-art results across $5$ benchmarks spanning chart summarization, question answering, and fact-checking, and our elaborate qualitative studies on real-world charts show that ChartGemma generates more realistic and factually correct summaries compared to its contemporaries.
 
62
  Please cite our paper if you use our model in your research.
63
 
64
  ```
65
+ @misc{masry2024chartgemmavisualinstructiontuningchart,
66
+ title={ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild},
67
+ author={Ahmed Masry and Megh Thakkar and Aayush Bajaj and Aaryaman Kartha and Enamul Hoque and Shafiq Joty},
68
+ year={2024},
69
+ eprint={2407.04172},
70
+ archivePrefix={arXiv},
71
+ primaryClass={cs.AI},
72
+ url={https://arxiv.org/abs/2407.04172},
73
+ }
74
 
75
  ```