Update README.md
Browse files
README.md
CHANGED
@@ -7,7 +7,7 @@ language:
|
|
7 |
# ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild
|
8 |
|
9 |
|
10 |
-
Paper Link:
|
11 |
|
12 |
The abstract of the paper states that:
|
13 |
> Given the ubiquity of charts as a data analysis, visualization, and decision-making tool across industries and sciences, there has been a growing interest in developing pre-trained foundation models as well as general purpose instruction-tuned models for chart understanding and reasoning. However, existing methods suffer crucial drawbacks across two critical axes affecting the performance of chart representation models: they are trained on data generated from underlying data tables of the charts, ignoring the visual trends and patterns in chart images, \emph{and} use weakly aligned vision-language backbone models for domain-specific training, limiting their generalizability when encountering charts in the wild. We address these important drawbacks and introduce ChartGemma, a novel chart understanding and reasoning model developed over PaliGemma. Rather than relying on underlying data tables, ChartGemma is trained on instruction-tuning data generated directly from chart images, thus capturing both high-level trends and low-level visual information from a diverse set of charts. Our simple approach achieves state-of-the-art results across $5$ benchmarks spanning chart summarization, question answering, and fact-checking, and our elaborate qualitative studies on real-world charts show that ChartGemma generates more realistic and factually correct summaries compared to its contemporaries.
|
@@ -62,5 +62,14 @@ If you have any questions about this work, please contact **[Ahmed Masry](https:
|
|
62 |
Please cite our paper if you use our model in your research.
|
63 |
|
64 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
65 |
|
66 |
```
|
|
|
7 |
# ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild
|
8 |
|
9 |
|
10 |
+
Paper Link: https://arxiv.org/abs/2407.04172
|
11 |
|
12 |
The abstract of the paper states that:
|
13 |
> Given the ubiquity of charts as a data analysis, visualization, and decision-making tool across industries and sciences, there has been a growing interest in developing pre-trained foundation models as well as general purpose instruction-tuned models for chart understanding and reasoning. However, existing methods suffer crucial drawbacks across two critical axes affecting the performance of chart representation models: they are trained on data generated from underlying data tables of the charts, ignoring the visual trends and patterns in chart images, \emph{and} use weakly aligned vision-language backbone models for domain-specific training, limiting their generalizability when encountering charts in the wild. We address these important drawbacks and introduce ChartGemma, a novel chart understanding and reasoning model developed over PaliGemma. Rather than relying on underlying data tables, ChartGemma is trained on instruction-tuning data generated directly from chart images, thus capturing both high-level trends and low-level visual information from a diverse set of charts. Our simple approach achieves state-of-the-art results across $5$ benchmarks spanning chart summarization, question answering, and fact-checking, and our elaborate qualitative studies on real-world charts show that ChartGemma generates more realistic and factually correct summaries compared to its contemporaries.
|
|
|
62 |
Please cite our paper if you use our model in your research.
|
63 |
|
64 |
```
|
65 |
+
@misc{masry2024chartgemmavisualinstructiontuningchart,
|
66 |
+
title={ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild},
|
67 |
+
author={Ahmed Masry and Megh Thakkar and Aayush Bajaj and Aaryaman Kartha and Enamul Hoque and Shafiq Joty},
|
68 |
+
year={2024},
|
69 |
+
eprint={2407.04172},
|
70 |
+
archivePrefix={arXiv},
|
71 |
+
primaryClass={cs.AI},
|
72 |
+
url={https://arxiv.org/abs/2407.04172},
|
73 |
+
}
|
74 |
|
75 |
```
|