Update README.md
Browse files
README.md
CHANGED
@@ -20,8 +20,8 @@ library_name: transformers
|
|
20 |
|
21 |
|
22 |
<h4> |<a href="https://arxiv.org/abs/2402.16107"> π Paper </a> |
|
23 |
-
<a href="https://huggingface.co/FuseAI"> π€
|
24 |
-
<a href="https://github.com/fanqiwan/FuseLLM"> π±
|
25 |
</h4>
|
26 |
|
27 |
<!-- **Authors:** -->
|
@@ -38,12 +38,28 @@ _Sun Yat-sen University_
|
|
38 |
<img src="./assets/fig_0.png" width="70%"> <br>
|
39 |
</p>
|
40 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
41 |
</div>
|
42 |
|
43 |
|
44 |
## News
|
45 |
- **Feb 26, 2024:** π₯ We release [FuseChat-7B-VaRM](https://huggingface.co/FuseAI/FuseChat-7B-VaRM), which is the fusion of three prominent chat LLMs with diverse architectures and scales, namely [NH2-Mixtral-8x7B](https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO), [NH2-Solar-10.7B](https://huggingface.co/NousResearch/Nous-Hermes-2-SOLAR-10.7B), and [OpenChat-3.5-7B](https://huggingface.co/openchat/openchat_3.5). FuseChat-7B-VaRM achieves an average performance of **8.22** on MT-Bench, outperforming various powerful chat LLMs at 7B and 34B scales like [Starling-7B](https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha) and [Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat), even surpassing [GPT-3.5 (March)](https://platform.openai.com/docs/models/gpt-3-5-turbo), [Claude-2.1](https://www.anthropic.com/news/claude-2-1), and approaching [Mixtral-8x7B-Instruct](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1).
|
46 |
|
|
|
47 |
|
48 |
## Contents
|
49 |
|
@@ -70,7 +86,7 @@ Moreover, we argue that the concept of knowledge fusion adopted by both FuseChat
|
|
70 |
|
71 |
## Model Release
|
72 |
|
73 |
-
We release [FuseChat-7B-VaRM](https://huggingface.co/FuseAI/FuseChat-7B-VaRM), which is the fusion of three prominent chat LLMs with diverse architectures and scales, namely [NH2-Mixtral-8x7B](https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO), [NH2-Solar-10.7B](https://huggingface.co/NousResearch/Nous-Hermes-2-SOLAR-10.7B), and [OpenChat-3.5-7B](https://huggingface.co/openchat/openchat_3.5).
|
74 |
|
75 |
To support a plug-and-play fusion of new source LLM, we release our target LLMs: [OpenChat-3.5-7B-Solar](https://huggingface.co/FuseAI/OpenChat-3.5-7B-Solar) and [OpenChat-3.5-7B-Mixtral](https://huggingface.co/FuseAI/OpenChat-3.5-7B-Mixtral), which are obtained from pair-wise knowledge fusion. Integrating a new source LLM at any scale requires only obtaining a target LLM from the new source LLM and merging it with the existing target LLMs.
|
76 |
|
@@ -333,7 +349,7 @@ torchrun --nproc_per_node=8 --master_port=20001 /train/train.py \
|
|
333 |
We show the scripts to obtain the final FuseChat using different merging methods.
|
334 |
|
335 |
```bash
|
336 |
-
# For "slerp", "ta", "ties", and "dare" methods
|
337 |
export CUDA_VISIBLE_DEVICES=0
|
338 |
mergekit-yaml merge/mergekit_configs/fusechat-slerp.yml "<path_to_save_fusechat_7b_slerp>"
|
339 |
mergekit-yaml merge/mergekit_configs/fusechat-ta.yml "<path_to_save_fusechat_7b_ta>"
|
|
|
20 |
|
21 |
|
22 |
<h4> |<a href="https://arxiv.org/abs/2402.16107"> π Paper </a> |
|
23 |
+
<a href="https://huggingface.co/FuseAI"> π€ HuggingFace Repo </a> |
|
24 |
+
<a href="https://github.com/fanqiwan/FuseLLM"> π± GitHub Repo </a> |
|
25 |
</h4>
|
26 |
|
27 |
<!-- **Authors:** -->
|
|
|
38 |
<img src="./assets/fig_0.png" width="70%"> <br>
|
39 |
</p>
|
40 |
|
41 |
+
| Proprietary Models | #Params | MT-Bench | Open Source Models | #Params | MT-Bench |
|
42 |
+
|-----------------------------------------------------------------------|---------|----------|-----------------------------------------------------------------------|---------|----------|
|
43 |
+
| GPT-4-1106-preview | - | 9.32 | Qwen1.5-72B-Chat | 72B | 8.61 |
|
44 |
+
| GPT-4-0613 | - | 9.18 | Nous-Hermes-2-Mixtral-8x7B-DPO | 8x7B | 8.33 |
|
45 |
+
| GPT-4-0314 | - | 8.96 | Mixtral-8x7B-Instruct-v0.1 | 8x7B | 8.30 |
|
46 |
+
| Mistral Medium | - | 8.61 | π€ [FuseChat-7B-VaRM](https://huggingface.co/FuseAI/FuseChat-7B-VaRM) | 7B | 8.22 |
|
47 |
+
| GPT-3.5-Turbo-0613 | - | 8.39 | Starling-LM-7B-alpha | 7B | 8.09 |
|
48 |
+
| GPT-3.5-Turbo-1106 | - | 8.32 | Tulu-2-DPO-70B | 70B | 7.89 |
|
49 |
+
| π€ [FuseChat-7B-VaRM](https://huggingface.co/FuseAI/FuseChat-7B-VaRM) | 7B | 8.22 | OpenChat-3.5 | 7B | 7.81 |
|
50 |
+
| Claude-2.1 | - | 8.18 | OpenChat-3.5-0106 | 7B | 7.80 |
|
51 |
+
| Claude-2.0 | - | 8.06 | WizardLM-70B-v1.0 | 70B | 7.71 |
|
52 |
+
| GPT-3.5-Turbo-0314 | - | 7.94 | Yi-34B-Chat | 34B | 7.67 |
|
53 |
+
| Claude-1 | - | 7.90 | Nous-Hermes-2-SOLAR-10.7B | 10.7B | 7.66 |
|
54 |
+
|
55 |
+
|
56 |
</div>
|
57 |
|
58 |
|
59 |
## News
|
60 |
- **Feb 26, 2024:** π₯ We release [FuseChat-7B-VaRM](https://huggingface.co/FuseAI/FuseChat-7B-VaRM), which is the fusion of three prominent chat LLMs with diverse architectures and scales, namely [NH2-Mixtral-8x7B](https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO), [NH2-Solar-10.7B](https://huggingface.co/NousResearch/Nous-Hermes-2-SOLAR-10.7B), and [OpenChat-3.5-7B](https://huggingface.co/openchat/openchat_3.5). FuseChat-7B-VaRM achieves an average performance of **8.22** on MT-Bench, outperforming various powerful chat LLMs at 7B and 34B scales like [Starling-7B](https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha) and [Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat), even surpassing [GPT-3.5 (March)](https://platform.openai.com/docs/models/gpt-3-5-turbo), [Claude-2.1](https://www.anthropic.com/news/claude-2-1), and approaching [Mixtral-8x7B-Instruct](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1).
|
61 |
|
62 |
+
- **Feb 25, 2024:** π₯ We release [FuseChat-Mixture](https://huggingface.co/datasets/FuseAI/FuseChat-Mixture), which is a comprehensive training dataset covers different styles and capabilities, featuring both human-written and model-generated, and spanning general instruction-following and specific skills.
|
63 |
|
64 |
## Contents
|
65 |
|
|
|
86 |
|
87 |
## Model Release
|
88 |
|
89 |
+
We release [FuseChat-7B-VaRM](https://huggingface.co/FuseAI/FuseChat-7B-VaRM), which is the fusion of three prominent chat LLMs with diverse architectures and scales, namely [NH2-Mixtral-8x7B](https://huggingface.co/NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO), [NH2-Solar-10.7B](https://huggingface.co/NousResearch/Nous-Hermes-2-SOLAR-10.7B), and [OpenChat-3.5-7B](https://huggingface.co/openchat/openchat_3.5). FuseChat-7B-VaRM achieves an average performance of **8.22** on MT-Bench, outperforming various powerful chat LLMs at 7B and 34B scales like [Starling-7B](https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha) and [Yi-34B-Chat](https://huggingface.co/01-ai/Yi-34B-Chat), even surpassing [GPT-3.5 (March)](https://platform.openai.com/docs/models/gpt-3-5-turbo), [Claude-2.1](https://www.anthropic.com/news/claude-2-1), and approaching [Mixtral-8x7B-Instruct](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1).
|
90 |
|
91 |
To support a plug-and-play fusion of new source LLM, we release our target LLMs: [OpenChat-3.5-7B-Solar](https://huggingface.co/FuseAI/OpenChat-3.5-7B-Solar) and [OpenChat-3.5-7B-Mixtral](https://huggingface.co/FuseAI/OpenChat-3.5-7B-Mixtral), which are obtained from pair-wise knowledge fusion. Integrating a new source LLM at any scale requires only obtaining a target LLM from the new source LLM and merging it with the existing target LLMs.
|
92 |
|
|
|
349 |
We show the scripts to obtain the final FuseChat using different merging methods.
|
350 |
|
351 |
```bash
|
352 |
+
# For "slerp", "ta", "ties", and "dare" methods (Please install "mergekit")
|
353 |
export CUDA_VISIBLE_DEVICES=0
|
354 |
mergekit-yaml merge/mergekit_configs/fusechat-slerp.yml "<path_to_save_fusechat_7b_slerp>"
|
355 |
mergekit-yaml merge/mergekit_configs/fusechat-ta.yml "<path_to_save_fusechat_7b_ta>"
|