Safetensors
File size: 6,298 Bytes
1d9c869
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
---
datasets:
- ulab-ai/FusionBench
---
# Fusing LLM Capabilities with Routing Data

<p align="center">
    <a href="https://ulab-uiuc.github.io/FusionFactory/">
        <img alt="Project Page" src="https://img.shields.io/badge/Project-Page-blue">
    </a>
    <a href="http://arxiv.org/abs/2507.10540">
        <img alt="arXiv" src="https://img.shields.io/badge/arXiv-2507.10540-red?logo=arxiv">
    </a>
    <!-- <a href="xxx">
        <img alt="Twitter" src="https://img.shields.io/badge/Twitter-black?logo=X">
    </a> -->
    <a href="https://github.com/ulab-uiuc/FusionFactory/blob/master/LICENSE">
        <img alt="License" src="https://img.shields.io/badge/LICENSE-MIT-green">
    </a>
    <br>
    <a href="https://github.com/ulab-uiuc/FusionFactory">
        <img alt="Stars" src="https://img.shields.io/github/stars/ulab-uiuc/FusionFactory">
    </a>
    <a href="https://github.com/ulab-uiuc/FusionFactory">
        <img alt="Forks" src="https://img.shields.io/github/forks/ulab-uiuc/FusionFactory">
    </a>
    <a href="https://github.com/ulab-uiuc/FusionFactory">
        <img alt="Issues" src="https://img.shields.io/github/issues/ulab-uiuc/FusionFactory">
    </a>
</p>

<p align="center">
    <a href="https://ulab-uiuc.github.io/FusionFactory/">🌐 Project Page</a> |
    <a href="http://arxiv.org/abs/2507.10540">📜 arXiv</a> |
    <a href="https://huggingface.co/datasets/ulab-ai/FusionBench">📂 Dataset</a> |
    <a href="https://huggingface.co/ulab-ai/FusionFactory">🤖 Model</a> |
    <a href="https://huggingface.co/spaces/ulab-ai/RoutePilot">🖥️ Demo</a>
</p>




<div align="center">
  <img src="./figures/fusion.jpg" width="700" alt="FusionBench">
  <p><b>Overview of LLM capability fusion via FusionFactory with three representative levels: Query-level, Thought-level, and Model-level.</b></p>
</div>


## News

**[2025.06]** 🌟 **FusionFactory** was released.



## 🛠️Environment Setup

```bash
conda create -n fusionfactory python=3.9
conda activate fusionfactory
pip install pandas
pip install datasets
pip install tqdm
pip install transformers
pip install sentence_transformers
pip install torch
pip install numpy
```



## 🎯Data Process

Run the following command to start data collection.

```bash
# split: train OR test
# case num: 500 for train & 50 for partial test
# a sample of LLM description: ./data_process/LLM_Descriptions.json
python data_process/data_combine.py \
--split train \
--case_num 500 \
--round 5 \
--llm_description_path [YOUR_LLM_PATH] \
--csv_save_path [YOUR_SAVE_PATH] \
--api_base [YOUR_API_BASE] \
--api_key [YOUR_API_KEY]
```

You may refer to the specific README in the [`data_process`](data_process/README.md) directory for detailed argument descriptions.

To add quality scores to the collected data using an LLM judge:

```bash
python data_process/add_llm_judge.py
```

This will evaluate each response and add quality scores to the dataset, which can be used for training and evaluation purposes. See the [`data_process/README.md`](data_process/README.md) for more details.




## 📊Experiments


### Query-level Fusion

First, run the data preprocessing script to prepare the dataset:

```bash
# Preprocess the dataset and generate training/testing files
python query_level/data_processing.py
```

For more detailed information about the data preprocessing and model training process, please refer to the specific README in the [`query_level`](query_level/README.md) directory.



### Thought-level Fusion
First, run the data preprocessing script to prepare the thought prompts:

```bash
# Preprocess the dataset and generate training/testing files
python query_level/data_processing.py
```

Or run the script to directly use Huggingface datasets to generate thought-enhanced queries

```bash
python thought_level/get_thought_prompt.py
```

For more detailed information about the data preprocessing and model training process, please refer to the specific README in the [`thought_level`](thought_level/README.md) directory.


### Model-level Fusion

You can refer to [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) for detailed instructions to start fine-tuning on model-level fusion data. Make sure to first clone the LLaMA-Factory repository into the FusionBench directory, and then execute the following commands to generate SFT data for model-level fusion:


```bash
# setting: perf, judge, hybrid, baseline
python model_level/sft_data_gen.py --settin perf --k 5 --save_path [YOUR_PATH] --csv_path_with_judge [YOUR_PATH]

python model_level/sft_test_gen.py --save_path [YOUR_PATH] --csv_path [YOUR_PATH]
```

Then, you can use the following commands to start SFT and Inference after essential configuration described in [LLaMA-Factory Doc](https://llamafactory.readthedocs.io/en/latest/)

```bash
# SFT
FORCE_TORCHRUN=1 CUDA_VISIBLE_DEVICES=2,3,4,5 llamafactory-cli train examples/train_lora/[YOUR_YAML].yaml

# Inference
CUDA_VISIBLE_DEVICES=2,3,4,5 python scripts/vllm_infer.py --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --adapter_name_or_path saves/llama3.1-8b/lora/[YOUR_PATH] --dataset router_test --cutoff_len 2048
```


You may refer to the specific README in the [`model_level`](model_level/README.md) directory for detailed instructions.


## 📈 Evaluation

FusionBench provides a comprehensive evaluation framework to assess model performance across various tasks. The evaluation framework supports multiple types of tasks including:

- Mathematical Reasoning (GSM8K, MATH)
- Code Generation (MBPP, HumanEval)
- Commonsense Reasoning (CommonsenseQA, OpenBookQA, ARC Challenge, HellaSwag)
- World Knowledge (Natural Questions, TriviaQA)
- Reading Comprehension (SQuAD, BoolQ)
- Popular Benchmarks (MMLU, GPQA)

To evaluate your model's performance:

```bash
python eval/response_eval.py
```

For detailed information about the evaluation framework, supported metrics, and usage instructions, please refer to the [Evaluation Documentation](eval/README.md).


## Citation

```bibtex
@article{FusionFactory,
  title={Fusing LLM Capabilities with Routing Data},
  author={Tao Feng and Haozhen Zhang and Zijie Lei and Pengrui Han and Mostofa Patwary and Mohammad Shoeybi and Bryan Catanzaro and Jiaxuan You},
  journal={arXiv preprint arXiv:2507.10540},
  year={2025}
}
```