Safetensors
Files changed (1) hide show
  1. README.md +190 -0
README.md ADDED
@@ -0,0 +1,190 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - ulab-ai/FusionBench
4
+ ---
5
+ # Fusing LLM Capabilities with Routing Data
6
+
7
+ <p align="center">
8
+ <a href="https://ulab-uiuc.github.io/FusionFactory/">
9
+ <img alt="Project Page" src="https://img.shields.io/badge/Project-Page-blue">
10
+ </a>
11
+ <a href="http://arxiv.org/abs/2507.10540">
12
+ <img alt="arXiv" src="https://img.shields.io/badge/arXiv-2507.10540-red?logo=arxiv">
13
+ </a>
14
+ <!-- <a href="xxx">
15
+ <img alt="Twitter" src="https://img.shields.io/badge/Twitter-black?logo=X">
16
+ </a> -->
17
+ <a href="https://github.com/ulab-uiuc/FusionFactory/blob/master/LICENSE">
18
+ <img alt="License" src="https://img.shields.io/badge/LICENSE-MIT-green">
19
+ </a>
20
+ <br>
21
+ <a href="https://github.com/ulab-uiuc/FusionFactory">
22
+ <img alt="Stars" src="https://img.shields.io/github/stars/ulab-uiuc/FusionFactory">
23
+ </a>
24
+ <a href="https://github.com/ulab-uiuc/FusionFactory">
25
+ <img alt="Forks" src="https://img.shields.io/github/forks/ulab-uiuc/FusionFactory">
26
+ </a>
27
+ <a href="https://github.com/ulab-uiuc/FusionFactory">
28
+ <img alt="Issues" src="https://img.shields.io/github/issues/ulab-uiuc/FusionFactory">
29
+ </a>
30
+ </p>
31
+
32
+ <p align="center">
33
+ <a href="https://ulab-uiuc.github.io/FusionFactory/">🌐 Project Page</a> |
34
+ <a href="http://arxiv.org/abs/2507.10540">📜 arXiv</a> |
35
+ <a href="https://huggingface.co/datasets/ulab-ai/FusionBench">📂 Dataset</a> |
36
+ <a href="https://huggingface.co/ulab-ai/FusionFactory">🤖 Model</a> |
37
+ <a href="https://huggingface.co/spaces/ulab-ai/RoutePilot">🖥️ Demo</a>
38
+ </p>
39
+
40
+
41
+
42
+
43
+ <div align="center">
44
+ <img src="./figures/fusion.jpg" width="700" alt="FusionBench">
45
+ <p><b>Overview of LLM capability fusion via FusionFactory with three representative levels: Query-level, Thought-level, and Model-level.</b></p>
46
+ </div>
47
+
48
+
49
+ ## News
50
+
51
+ **[2025.06]** 🌟 **FusionFactory** was released.
52
+
53
+
54
+
55
+ ## 🛠️Environment Setup
56
+
57
+ ```bash
58
+ conda create -n fusionfactory python=3.9
59
+ conda activate fusionfactory
60
+ pip install pandas
61
+ pip install datasets
62
+ pip install tqdm
63
+ pip install transformers
64
+ pip install sentence_transformers
65
+ pip install torch
66
+ pip install numpy
67
+ ```
68
+
69
+
70
+
71
+ ## 🎯Data Process
72
+
73
+ Run the following command to start data collection.
74
+
75
+ ```bash
76
+ # split: train OR test
77
+ # case num: 500 for train & 50 for partial test
78
+ # a sample of LLM description: ./data_process/LLM_Descriptions.json
79
+ python data_process/data_combine.py \
80
+ --split train \
81
+ --case_num 500 \
82
+ --round 5 \
83
+ --llm_description_path [YOUR_LLM_PATH] \
84
+ --csv_save_path [YOUR_SAVE_PATH] \
85
+ --api_base [YOUR_API_BASE] \
86
+ --api_key [YOUR_API_KEY]
87
+ ```
88
+
89
+ You may refer to the specific README in the [`data_process`](data_process/README.md) directory for detailed argument descriptions.
90
+
91
+ To add quality scores to the collected data using an LLM judge:
92
+
93
+ ```bash
94
+ python data_process/add_llm_judge.py
95
+ ```
96
+
97
+ This will evaluate each response and add quality scores to the dataset, which can be used for training and evaluation purposes. See the [`data_process/README.md`](data_process/README.md) for more details.
98
+
99
+
100
+
101
+
102
+ ## 📊Experiments
103
+
104
+
105
+ ### Query-level Fusion
106
+
107
+ First, run the data preprocessing script to prepare the dataset:
108
+
109
+ ```bash
110
+ # Preprocess the dataset and generate training/testing files
111
+ python query_level/data_processing.py
112
+ ```
113
+
114
+ For more detailed information about the data preprocessing and model training process, please refer to the specific README in the [`query_level`](query_level/README.md) directory.
115
+
116
+
117
+
118
+ ### Thought-level Fusion
119
+ First, run the data preprocessing script to prepare the thought prompts:
120
+
121
+ ```bash
122
+ # Preprocess the dataset and generate training/testing files
123
+ python query_level/data_processing.py
124
+ ```
125
+
126
+ Or run the script to directly use Huggingface datasets to generate thought-enhanced queries
127
+
128
+ ```bash
129
+ python thought_level/get_thought_prompt.py
130
+ ```
131
+
132
+ For more detailed information about the data preprocessing and model training process, please refer to the specific README in the [`thought_level`](thought_level/README.md) directory.
133
+
134
+
135
+ ### Model-level Fusion
136
+
137
+ You can refer to [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) for detailed instructions to start fine-tuning on model-level fusion data. Make sure to first clone the LLaMA-Factory repository into the FusionBench directory, and then execute the following commands to generate SFT data for model-level fusion:
138
+
139
+
140
+ ```bash
141
+ # setting: perf, judge, hybrid, baseline
142
+ python model_level/sft_data_gen.py --settin perf --k 5 --save_path [YOUR_PATH] --csv_path_with_judge [YOUR_PATH]
143
+
144
+ python model_level/sft_test_gen.py --save_path [YOUR_PATH] --csv_path [YOUR_PATH]
145
+ ```
146
+
147
+ Then, you can use the following commands to start SFT and Inference after essential configuration described in [LLaMA-Factory Doc](https://llamafactory.readthedocs.io/en/latest/)
148
+
149
+ ```bash
150
+ # SFT
151
+ FORCE_TORCHRUN=1 CUDA_VISIBLE_DEVICES=2,3,4,5 llamafactory-cli train examples/train_lora/[YOUR_YAML].yaml
152
+
153
+ # Inference
154
+ CUDA_VISIBLE_DEVICES=2,3,4,5 python scripts/vllm_infer.py --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --adapter_name_or_path saves/llama3.1-8b/lora/[YOUR_PATH] --dataset router_test --cutoff_len 2048
155
+ ```
156
+
157
+
158
+ You may refer to the specific README in the [`model_level`](model_level/README.md) directory for detailed instructions.
159
+
160
+
161
+ ## 📈 Evaluation
162
+
163
+ FusionBench provides a comprehensive evaluation framework to assess model performance across various tasks. The evaluation framework supports multiple types of tasks including:
164
+
165
+ - Mathematical Reasoning (GSM8K, MATH)
166
+ - Code Generation (MBPP, HumanEval)
167
+ - Commonsense Reasoning (CommonsenseQA, OpenBookQA, ARC Challenge, HellaSwag)
168
+ - World Knowledge (Natural Questions, TriviaQA)
169
+ - Reading Comprehension (SQuAD, BoolQ)
170
+ - Popular Benchmarks (MMLU, GPQA)
171
+
172
+ To evaluate your model's performance:
173
+
174
+ ```bash
175
+ python eval/response_eval.py
176
+ ```
177
+
178
+ For detailed information about the evaluation framework, supported metrics, and usage instructions, please refer to the [Evaluation Documentation](eval/README.md).
179
+
180
+
181
+ ## Citation
182
+
183
+ ```bibtex
184
+ @article{FusionFactory,
185
+ title={Fusing LLM Capabilities with Routing Data},
186
+ author={Tao Feng and Haozhen Zhang and Zijie Lei and Pengrui Han and Mostofa Patwary and Mohammad Shoeybi and Bryan Catanzaro and Jiaxuan You},
187
+ journal={arXiv preprint arXiv:2507.10540},
188
+ year={2025}
189
+ }
190
+ ```