Wanfq commited on
Commit
388d3b1
ยท
verified ยท
1 Parent(s): e9183d5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -12
README.md CHANGED
@@ -15,13 +15,7 @@ pinned: false
15
  <p style="font-size: 40px; font-weight: bold;">Knowledge Fusion of Large Language Models</p>
16
 
17
 
18
- <h4> |<a href="https://arxiv.org/abs/2401.10491"> ๐Ÿ“‘ FuseLLM Paper @ICLR2024 </a> |
19
- <a href="https://arxiv.org/abs/2408.07990"> ๐Ÿ“‘ FuseChat Tech Report </a> |
20
- <a href="https://arxiv.org/abs/2412.03187"> ๐Ÿ“‘ WRPO Paper </a> |
21
- <a href="https://slit-ai.github.io/FuseChat-3.0/"> ๐ŸŒ FuseChat-3.0 Website </a> |
22
-
23
- |<a href="https://huggingface.co/FuseAI"> ๐Ÿค— HuggingFace Repo </a> |
24
- <a href="https://github.com/fanqiwan/FuseLLM"> ๐Ÿฑ GitHub Repo </a> |
25
  </h4>
26
 
27
  <p align="center">
@@ -40,19 +34,21 @@ Welcome to join us!
40
 
41
  ## News
42
 
43
- ### FuseO1-Preview [Comparable to o1-mini on AIME24]
44
 
45
- - **Jan 21, 2025:** ๐Ÿ”ฅ [FuseO1-Preview](https://huggingface.co/collections/FuseAI/fuseo1-preview-678eb56093649b2688bc9977) is our initial endeavor to enhance the System-II reasoning capabilities of large language models (LLMs) through innovative model fusion techniques. By employing advanced [SCE](https://arxiv.org/abs/2408.07990) merging methodologies, we integrate multiple open-source o1-like LLMs into a unified model. Our goal is to incorporate the distinct knowledge and strengths from different reasoning LLMs into a single, unified model with strong System-II reasoning abilities, particularly in mathematics, coding, and science domains.
46
 
47
  To achieve this, we conduct two types of model merging:
48
 
49
- - **Long-Long Reasoning Merging**: This approach involves model fusion across LLMs that utilize long-CoT reasoning, with the goal of enhancing long-CoT reasoning capabilities. The resulted [FuseAI/FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeekSeekR1-QwQ-SkyT1-32B-Preview) achieves an accuracy of **60.00 on AIME24**, demonstrating significant performance improvements compared to the o1-preview model (44.60) and approaching the performance of the o1-mini model (63.60).
50
- - **Long-Short Reasoning Merging**: This approach involves model fusion between long-CoT and short-CoT LLMs, aiming to improve reasoning capabilities in both long and short reasoning processes. The resulted [FuseAI/FuseO1-DeekSeekR1-Qwen2.5-Instruct-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeekSeekR1-Qwen2.5-Instruct-32B-Preview) is capable of utilizing both long and short reasoning processes and demonstrates relatively strong performance in long reasoning tasks.
 
51
 
52
  <p align="center">
53
- <img src="fuseo1-preview.jpg" width="100%"> <br>
54
  </p>
55
 
 
56
  ### FuseChat-3.0 [SOTA 8B LLM on AlpacaEval-2 & Arena-Hard]
57
 
58
  - **Dec 12, 2024:** ๐Ÿ”ฅ We release [FuseChat-3.0](https://huggingface.co/collections/FuseAI/fusechat-30-6752d18dec430bad7a236a75) and [Blog Post](https://slit-ai.github.io/FuseChat-3.0/). FuseChat-3.0 contains a series of models crafted to enhance performance by integrating the strengths of multiple source LLMs into more compact target LLMs. To achieve this fusion, we utilized four powerful source LLMs: [Gemma-2-27b-It](https://huggingface.co/google/gemma-2-27b-it), [Mistral-Large-Instruct-2407](https://huggingface.co/mistralai/Mistral-Large-Instruct-2407), [Qwen-2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct), and [Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct). For the target LLMs, we employed three widely-used smaller modelsโ€”[Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct), [Gemma-2-9B-It](https://huggingface.co/google/gemma-2-9b-it), and [Qwen-2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)โ€”along with two even more compact modelsโ€”[Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) and [Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct). . The implicit model fusion process involves a two-stage training pipeline comprising Supervised Fine-Tuning (SFT) to mitigate distribution discrepancies between target and source LLMs, and Direct Preference Optimization (DPO) for learning preferences from multiple source LLMs. The resulting FuseChat-3.0 models demonstrated substantial improvements in tasks related to general conversation, instruction following, mathematics, and coding. Notably, when Llama-3.1-8B-Instruct served as the target LLM, our fusion approach achieved an average improvement of **6.8** points across 14 benchmarks. Moreover, it showed significant improvements of **37.1** and **30.1** points on instruction-following test sets AlpacaEval-2 and Arena-Hard respectively.
 
15
  <p style="font-size: 40px; font-weight: bold;">Knowledge Fusion of Large Language Models</p>
16
 
17
 
18
+ <h4> |<a href="https://arxiv.org/abs/2401.10491"> ๐Ÿ“‘ FuseLLM Paper @ICLR2024 </a> | <a href="https://arxiv.org/abs/2408.07990"> ๐Ÿ“‘ FuseChat Tech Report </a> | <a href="https://arxiv.org/abs/2412.03187"> ๐Ÿ“‘ WRPO Tech Report </a> | <a href="https://huggingface.co/FuseAI"> ๐Ÿค— HuggingFace Repo </a> | <a href="https://github.com/fanqiwan/FuseLLM"> ๐Ÿฑ GitHub Repo </a> | <a href="https://huggingface.co/blog/Wanfq/fusechat-3"> ๐ŸŒ FuseChat-3.0 Blog </a> | <a href="https://huggingface.co/blog/Wanfq/fuseo1-preview"> ๐ŸŒ FuseO1-Preview Blog </a> |
 
 
 
 
 
 
19
  </h4>
20
 
21
  <p align="center">
 
34
 
35
  ## News
36
 
37
+ ### FuseO1-Preview [74.0 on AIME24, approaching OpenAI o1's 79.2]
38
 
39
+ - **Jan 21, 2025:** ๐Ÿ”ฅ [FuseO1-Preview](https://huggingface.co/collections/FuseAI/fuseo1-preview-678eb56093649b2688bc9977) is our initial endeavor to enhance the System-II reasoning capabilities of large language models (LLMs) through innovative model fusion techniques. By employing our advanced [SCE](https://arxiv.org/abs/2408.07990) merging methodologies, we integrate multiple open-source o1-like LLMs into a unified model. Our goal is to incorporate the distinct knowledge and strengths from different reasoning LLMs into a single, unified model with strong System-II reasoning abilities, particularly in mathematics, coding, and science domains.
40
 
41
  To achieve this, we conduct two types of model merging:
42
 
43
+ - **Long-Long Reasoning Merging**: This approach involves model fusion across LLMs that utilize long-CoT reasoning, with the goal of enhancing long-CoT reasoning capabilities. The resulted [FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview) achieves a Pass@1 accuracy of **74.0 on AIME24**, demonstrating significant performance improvements compared to the OpenAI o1-preview (44.6) and OpenAI o1-mini (63.4), even approaching OpenAI o1 (79.2).
44
+ - **Long-Short Reasoning Merging**: This approach involves model fusion between long-CoT and short-CoT LLMs, aiming to improve reasoning capabilities in both long and short reasoning processes. The resulted [FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview) is capable of utilizing both long and short reasoning processes and demonstrates relatively strong performance in long reasoning tasks.
45
+
46
 
47
  <p align="center">
48
+ <img src="./FuseO1-Preview/assets/fuseo1-preview.jpg" width="100%"> <br>
49
  </p>
50
 
51
+
52
  ### FuseChat-3.0 [SOTA 8B LLM on AlpacaEval-2 & Arena-Hard]
53
 
54
  - **Dec 12, 2024:** ๐Ÿ”ฅ We release [FuseChat-3.0](https://huggingface.co/collections/FuseAI/fusechat-30-6752d18dec430bad7a236a75) and [Blog Post](https://slit-ai.github.io/FuseChat-3.0/). FuseChat-3.0 contains a series of models crafted to enhance performance by integrating the strengths of multiple source LLMs into more compact target LLMs. To achieve this fusion, we utilized four powerful source LLMs: [Gemma-2-27b-It](https://huggingface.co/google/gemma-2-27b-it), [Mistral-Large-Instruct-2407](https://huggingface.co/mistralai/Mistral-Large-Instruct-2407), [Qwen-2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2-72B-Instruct), and [Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct). For the target LLMs, we employed three widely-used smaller modelsโ€”[Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct), [Gemma-2-9B-It](https://huggingface.co/google/gemma-2-9b-it), and [Qwen-2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)โ€”along with two even more compact modelsโ€”[Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) and [Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct). . The implicit model fusion process involves a two-stage training pipeline comprising Supervised Fine-Tuning (SFT) to mitigate distribution discrepancies between target and source LLMs, and Direct Preference Optimization (DPO) for learning preferences from multiple source LLMs. The resulting FuseChat-3.0 models demonstrated substantial improvements in tasks related to general conversation, instruction following, mathematics, and coding. Notably, when Llama-3.1-8B-Instruct served as the target LLM, our fusion approach achieved an average improvement of **6.8** points across 14 benchmarks. Moreover, it showed significant improvements of **37.1** and **30.1** points on instruction-following test sets AlpacaEval-2 and Arena-Hard respectively.