File size: 6,337 Bytes
3a44570 4a40a42 78cc7e1 3a44570 6417eac 8dc8df6 6417eac 2b2bc45 8dc8df6 4a40a42 8dc8df6 7d9bb99 8dc8df6 6417eac 8e5114b 6417eac 3a44570 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 |
---
license: cc-by-4.0
library_name: transformers
pipeline_tag: text-generation
datasets:
- yongchao98/SymBench
metrics:
- accuracy
base_model:
- meta-llama/Llama-3.1-8B-Instruct
tags:
- Symbolic
- Code
- Text
---
# CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance
<img src="./Figures/Tag.png" width="650px" alt="s" />
These are the codes, models, and datasets for the following papers:
- [CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance](https://arxiv.org/abs/2502.04350)
- [Steering Large Language Models between Code Execution and Textual Reasoning (ICLR'2025)](https://arxiv.org/pdf/2410.03524)
Project page: https://github.com/yongchao98/CodeSteer-v1.0/
[Code](https://github.com/yongchao98/CodeSteer-v1.0)   
[Huggingface🤗](https://huggingface.co/yongchao98/CodeSteer-v1)   
[Model Weights](https://drive.google.com/drive/folders/1qb_rec6f8rMYtFKm0eQpad0L0uHCwgpL?usp=share_link)
[SymBench🤗](https://huggingface.co/datasets/yongchao98/SymBench)   
[Finetune Datasets](https://drive.google.com/drive/folders/1Byn-99gFd5ckRkPMJ8-zagzW7XDfO8ie?usp=share_link)   
[SymBench Datasets](https://github.com/yongchao98/CodeSteer-v1.0/tree/main/dataset_gather)   
[SymBench Synthesis Scripts](https://github.com/yongchao98/CodeSteer-v1.0/tree/main/benchmark)
## Contents
- [Framework](#Framework)
- [Inspirations](#Inspirations)
- [Performance](#Performance)
- [Environment_Setup](#Environment_Setup)
- [LLM_API_Key_Setup](#LLM_API_Key_Setup)
- [Train_and_Test_Models](#Train_and_Test_Models)
- [Assistance](#Assistance)
- [Citation](#Citation)
## Framework
<img src="./Figures/CodeSteer-intro.png" width="800px" alt="s" />
<p align="center" style="font-size: 16px;">
Figure: CodeSteer on guiding LLM code/text generation to integrate symbolic computing. At each interaction with TaskLLM, it reviews current and previous answers, then provides guidance for the next round.
</p>
## Inspirations
<img src="./Figures/LLM-makes-simple-mistakes-gather.png" width="800px" alt="s" />
<p align="center" style="font-size: 16px;">
Figure: The cases that GPT-4o makes simple mistakes by direct textual reasoning but can reliably solve the problem with prompted to use code.
</p>
## Performance
We compare GPT-4o + CodeSteer with OpenAI o1 and DeepSeek R1 on SymBench, with 28 seen tasks and 9 unseen tasks. GPT-4o + CodeSteer surpasses o1 (82.7), R1 (76.8), and o1-preview (74.8), highlighting the importance of integrating symbolic computing into LLMs.
<img src="./Figures/Table-results.png" width="800px" alt="s" />
The cost of tokens and runtimes for each method are as follows. GPT-4o + CodeSteer costs less tokens and runtimes than o1 and R1.
<img src="./Figures/Cost-token-runtime.png" width="800px" alt="s" />
## Environment_Setup
The fine-tuning and inference of CodeSteerLLM are based on [Llama-factory](https://github.com/hiyouga/LLaMA-Factory) with some modules modified by us.
```
git clone https://github.com/yongchao98/CodeSteer-v1.0.git
cd CodeSteer-v1.0
conda create -n CodeSteer python=3.10
conda activate CodeSteer
pip install -r requirements.txt
```
## LLM_API_Key_Setup
If you want to use several API-based LLMs as TaskLLM or CodeSteerLLM, then you need to set up API key.
1. First, create a .env file in your project root:
```
OPENAI_API_KEY='your_key_here'
CLAUDE_API_KEY='your_key_here'
MIXTRAL_API_KEY='your_key_here'
DEEPSEEK_API_KEY='your_key_here'
```
2. Add this .env file to your .gitignore to prevent accidentally committing it:
```
echo ".env" >> .gitignore
```
## Train_and_Test_Models
### Create_test_samples
The synthesized test samples for 37 tasks of SymBench are in [dataset_gather](https://github.com/yongchao98/CodeSteer-v1.0/tree/main/dataset_gather) dictionary. You can also synthezise the samples by yourself with tunable complexities with scripts in [create_dataset](https://github.com/yongchao98/CodeSteer-v1.0/tree/main/create_dataset).
### Run inference without GPU, test close LLM as CodeSteerLLM
We can directly use unfinetuned model like GPT-4o as CodeSteerLLM, in this case directly run
```
python benchmark_test_baseline.py
```
### Run inference with GPU, test finetuned CodeSteerLLM
We can infer Llama-3.1-8B with own GPUs (default setting is in infer_CodeSteer.sh using 4*H100 of Harvard Cluster, please modify freely with your own cluster settings). You can also download the [Model Weights](https://drive.google.com/drive/folders/1qb_rec6f8rMYtFKm0eQpad0L0uHCwgpL?usp=share_link) in your local and change the path in llama3_8B_CodeSteer.yaml.
```bash
bash infer_CodeSteer.sh
# default config file is ./llama3_8B_CodeSteer.yaml using the model uploaded on Huggingface.
```
### Finetuning CodeSteerLLM with synthesized data
Both our synthesized datasets of SFT and DPO finetuning are in [Finetune Datasets](https://drive.google.com/drive/folders/1Byn-99gFd5ckRkPMJ8-zagzW7XDfO8ie?usp=share_link).
We use Llama-factory and DeepSpeed for fintuning processes. First install Llama-factory with:
```
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e ".[torch,metrics]"
cd ..
```
Then we run the code with (default setting is in train_llama3-8B-CodeSteer.sh using 4*H100 of Harvard Cluster, please modify freely with your own cluster settings):
```
bash train_llama3-8B-CodeSteer.sh
```
## Assistance
We appreciate all feedback! Feel free to raise an issue for bugs, questions, or suggestions. Contacting [Yongchao Chen](https://yongchao98.github.io/YongchaoChen/) and [Chuchu Fan](https://chuchu.mit.edu) for any questions and discussion.
## Citation
```md
@misc{chen2025codesteersymbolicaugmentedlanguagemodels,
title={CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance},
author={Yongchao Chen and Yilun Hao and Yueying Liu and Yang Zhang and Chuchu Fan},
year={2025},
eprint={2502.04350},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.04350},
}
```
```md
@article{chen2024steering,
title={Steering Large Language Models between Code Execution and Textual Reasoning},
author={Chen, Yongchao and Jhamtani, Harsh and Sharma, Srinagesh and Fan, Chuchu and Wang, Chi},
journal={arXiv preprint arXiv:2410.03524},
year={2024}
}
``` |