xxxllz commited on
Commit
f9a9f5d
·
verified ·
1 Parent(s): 1f8376b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -0
README.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation
2
+
3
+ <a href="https://huggingface.co/datasets/xxxllz/Chart2Code-160k" target="_blank">🤗 Dataset(HuggingFace)</a>(TBD) | <a href="https://modelscope.cn/datasets/Noct25/Chart2Code-160k" target="_blank">🤖 Dataset(ModelScope)</a> | <a href="https://huggingface.co/xxxllz/ChartCoder" target="_blank">🤗 Model</a> | <a href="https://arxiv.org/abs/2501.06598" target="_blank">📑 Paper </a>
4
+
5
+ This repository contains the code to train and infer ChartCoder.
6
+
7
+
8
+ ## Installation
9
+ 1. Clone this repo
10
+ ```
11
+ git clone https://github.com/thunlp/ChartCoder.git
12
+ ```
13
+ 2. Create environment
14
+ ```
15
+ cd MMedAgent
16
+ conda create -n chartcoder python=3.10 -y
17
+ conda activate chartcoder
18
+ pip install --upgrade pip # enable PEP 660 support
19
+ pip install -e .
20
+ ```
21
+ 3. Additional packages required for training
22
+ ```
23
+ pip install -e ".[train]"
24
+ pip install flash-attn --no-build-isolation
25
+ ```
26
+
27
+ ## Train
28
+ The whole training process consists of two stages. To train the ChartCoder, ```siglip-so400m-patch14-384``` and ```deepseek-coder-6.7b-instruct``` should be downloaded first.
29
+
30
+ For **Pre-training**, run
31
+ ```
32
+ bash scripts/train/pretrain_siglip.sh
33
+ ```
34
+ For **SFT**, run
35
+ ```
36
+ bash scripts/train/finetune_siglip_a4.sh
37
+ ```
38
+ Please change the model path to your local path. See the corresponding ```.sh ``` file for details.
39
+ We also provide other training scripts, such as using CLIP ```_clip``` and multiple machines ```_m```. See ``` scripts/train ``` for further information.
40
+
41
+
42
+ ## Citation
43
+ If you find this work useful, consider giving this repository a star ⭐️ and citing 📝 our paper as follows:
44
+ ```
45
+ @misc{zhao2025chartcoderadvancingmultimodallarge,
46
+ title={ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation},
47
+ author={Xuanle Zhao and Xianzhen Luo and Qi Shi and Chi Chen and Shuo Wang and Wanxiang Che and Zhiyuan Liu and Maosong Sun},
48
+ year={2025},
49
+ eprint={2501.06598},
50
+ archivePrefix={arXiv},
51
+ primaryClass={cs.AI},
52
+ url={https://arxiv.org/abs/2501.06598},
53
+ }
54
+ ```