CodeSteer-v1 / README.md

Update README.md

7d9bb99 verified about 2 months ago

6.34 kB

	---
	license: cc-by-4.0
	library_name: transformers
	pipeline_tag: text-generation
	datasets:
	- yongchao98/SymBench
	metrics:
	- accuracy
	base_model:
	- meta-llama/Llama-3.1-8B-Instruct
	tags:
	- Symbolic
	- Code
	- Text
	---

	# CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance
	<img src="./Figures/Tag.png" width="650px" alt="s" />
	These are the codes, models, and datasets for the following papers:

	- [CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance](https://arxiv.org/abs/2502.04350)
	- [Steering Large Language Models between Code Execution and Textual Reasoning (ICLR'2025)](https://arxiv.org/pdf/2410.03524)

	Project page: https://github.com/yongchao98/CodeSteer-v1.0/

	[Code](https://github.com/yongchao98/CodeSteer-v1.0) &emsp;&emsp;
	[Huggingface🤗](https://huggingface.co/yongchao98/CodeSteer-v1) &emsp;&emsp;
	[Model Weights](https://drive.google.com/drive/folders/1qb_rec6f8rMYtFKm0eQpad0L0uHCwgpL?usp=share_link)
	[SymBench🤗](https://huggingface.co/datasets/yongchao98/SymBench) &emsp;&emsp;
	[Finetune Datasets](https://drive.google.com/drive/folders/1Byn-99gFd5ckRkPMJ8-zagzW7XDfO8ie?usp=share_link) &emsp;&emsp;
	[SymBench Datasets](https://github.com/yongchao98/CodeSteer-v1.0/tree/main/dataset_gather) &emsp;&emsp;
	[SymBench Synthesis Scripts](https://github.com/yongchao98/CodeSteer-v1.0/tree/main/benchmark)

	## Contents

	- [Framework](#Framework)
	- [Inspirations](#Inspirations)
	- [Performance](#Performance)
	- [Environment_Setup](#Environment_Setup)
	- [LLM_API_Key_Setup](#LLM_API_Key_Setup)
	- [Train_and_Test_Models](#Train_and_Test_Models)
	- [Assistance](#Assistance)
	- [Citation](#Citation)

	## Framework
	<img src="./Figures/CodeSteer-intro.png" width="800px" alt="s" />

	<p align="center" style="font-size: 16px;">
	Figure: CodeSteer on guiding LLM code/text generation to integrate symbolic computing. At each interaction with TaskLLM, it reviews current and previous answers, then provides guidance for the next round.
	</p>

	## Inspirations
	<img src="./Figures/LLM-makes-simple-mistakes-gather.png" width="800px" alt="s" />
	<p align="center" style="font-size: 16px;">
	Figure: The cases that GPT-4o makes simple mistakes by direct textual reasoning but can reliably solve the problem with prompted to use code.
	</p>


	## Performance
	We compare GPT-4o + CodeSteer with OpenAI o1 and DeepSeek R1 on SymBench, with 28 seen tasks and 9 unseen tasks. GPT-4o + CodeSteer surpasses o1 (82.7), R1 (76.8), and o1-preview (74.8), highlighting the importance of integrating symbolic computing into LLMs.

	<img src="./Figures/Table-results.png" width="800px" alt="s" />

	The cost of tokens and runtimes for each method are as follows. GPT-4o + CodeSteer costs less tokens and runtimes than o1 and R1.
	<img src="./Figures/Cost-token-runtime.png" width="800px" alt="s" />

	## Environment_Setup
	The fine-tuning and inference of CodeSteerLLM are based on [Llama-factory](https://github.com/hiyouga/LLaMA-Factory) with some modules modified by us.
	```
	git clone https://github.com/yongchao98/CodeSteer-v1.0.git
	cd CodeSteer-v1.0

	conda create -n CodeSteer python=3.10
	conda activate CodeSteer
	pip install -r requirements.txt
	```

	## LLM_API_Key_Setup
	If you want to use several API-based LLMs as TaskLLM or CodeSteerLLM, then you need to set up API key.

	1. First, create a .env file in your project root:
	```
	OPENAI_API_KEY='your_key_here'
	CLAUDE_API_KEY='your_key_here'
	MIXTRAL_API_KEY='your_key_here'
	DEEPSEEK_API_KEY='your_key_here'
	```
	2. Add this .env file to your .gitignore to prevent accidentally committing it:
	```
	echo ".env" >> .gitignore
	```

	## Train_and_Test_Models

	### Create_test_samples
	The synthesized test samples for 37 tasks of SymBench are in [dataset_gather](https://github.com/yongchao98/CodeSteer-v1.0/tree/main/dataset_gather) dictionary. You can also synthezise the samples by yourself with tunable complexities with scripts in [create_dataset](https://github.com/yongchao98/CodeSteer-v1.0/tree/main/create_dataset).

	### Run inference without GPU, test close LLM as CodeSteerLLM
	We can directly use unfinetuned model like GPT-4o as CodeSteerLLM, in this case directly run
	```
	python benchmark_test_baseline.py
	```

	### Run inference with GPU, test finetuned CodeSteerLLM
	We can infer Llama-3.1-8B with own GPUs (default setting is in infer_CodeSteer.sh using 4*H100 of Harvard Cluster, please modify freely with your own cluster settings). You can also download the [Model Weights](https://drive.google.com/drive/folders/1qb_rec6f8rMYtFKm0eQpad0L0uHCwgpL?usp=share_link) in your local and change the path in llama3_8B_CodeSteer.yaml.

	```bash
	bash infer_CodeSteer.sh
	# default config file is ./llama3_8B_CodeSteer.yaml using the model uploaded on Huggingface.
	```

	### Finetuning CodeSteerLLM with synthesized data
	Both our synthesized datasets of SFT and DPO finetuning are in [Finetune Datasets](https://drive.google.com/drive/folders/1Byn-99gFd5ckRkPMJ8-zagzW7XDfO8ie?usp=share_link).
	We use Llama-factory and DeepSpeed for fintuning processes. First install Llama-factory with:
	```
	git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
	cd LLaMA-Factory
	pip install -e ".[torch,metrics]"
	cd ..
	```

	Then we run the code with (default setting is in train_llama3-8B-CodeSteer.sh using 4*H100 of Harvard Cluster, please modify freely with your own cluster settings):
	```
	bash train_llama3-8B-CodeSteer.sh
	```

	## Assistance

	We appreciate all feedback! Feel free to raise an issue for bugs, questions, or suggestions. Contacting [Yongchao Chen](https://yongchao98.github.io/YongchaoChen/) and [Chuchu Fan](https://chuchu.mit.edu) for any questions and discussion.

	## Citation
	```md
	@misc{chen2025codesteersymbolicaugmentedlanguagemodels,
	title={CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance},
	author={Yongchao Chen and Yilun Hao and Yueying Liu and Yang Zhang and Chuchu Fan},
	year={2025},
	eprint={2502.04350},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2502.04350},
	}
	```

	```md
	@article{chen2024steering,
	title={Steering Large Language Models between Code Execution and Textual Reasoning},
	author={Chen, Yongchao and Jhamtani, Harsh and Sharma, Srinagesh and Fan, Chuchu and Wang, Chi},
	journal={arXiv preprint arXiv:2410.03524},
	year={2024}
	}
	```