Update README.md
Browse files
README.md
CHANGED
@@ -53,7 +53,7 @@ I have quantised the GGML files in this repo with the latest version. Therefore
|
|
53 |
I use the following command line; adjust for your tastes and needs:
|
54 |
|
55 |
```
|
56 |
-
./main -t 12 -m WizardLM-13B-1.0.
|
57 |
### Instruction:
|
58 |
Write a story about llamas
|
59 |
### Response:"
|
@@ -83,7 +83,7 @@ So if you're able and willing to contribute, it'd be most gratefully received an
|
|
83 |
Empowering Large Pre-Trained Language Models to Follow Complex Instructions
|
84 |
|
85 |
<p align="center" width="100%">
|
86 |
-
<a ><img src="imgs/WizardLM.png" alt="WizardLM" style="width: 20%; min-width: 300px; display: block; margin: auto;"></a>
|
87 |
</p>
|
88 |
|
89 |
[![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/LICENSE)
|
@@ -108,7 +108,7 @@ At present, our core contributors are preparing the **33B** version and we expec
|
|
108 |
|
109 |
We adopt the automatic evaluation framework based on GPT-4 proposed by FastChat to assess the performance of chatbot models. As shown in the following figure, WizardLM-13B achieved better results than Vicuna-13b.
|
110 |
<p align="center" width="100%">
|
111 |
-
<a ><img src="imgs/WizarLM13b-GPT4.png" alt="WizardLM" style="width: 100%; min-width: 300px; display: block; margin: auto;"></a>
|
112 |
</p>
|
113 |
|
114 |
### WizardLM-13B performance on different skills.
|
@@ -116,7 +116,7 @@ We adopt the automatic evaluation framework based on GPT-4 proposed by FastChat
|
|
116 |
The following figure compares WizardLM-13B and ChatGPT’s skill on Evol-Instruct testset. The result indicates that WizardLM-13B achieves 89.1% of ChatGPT’s performance on average, with almost 100% (or more than) capacity on 10 skills, and more than 90% capacity on 22 skills.
|
117 |
|
118 |
<p align="center" width="100%">
|
119 |
-
<a ><img src="imgs/evol-testset_skills-13b.png" alt="WizardLM" style="width: 100%; min-width: 300px; display: block; margin: auto;"></a>
|
120 |
</p>
|
121 |
|
122 |
## Call for Feedbacks
|
@@ -135,11 +135,11 @@ We just sample some cases to demonstrate the performance of WizardLM and ChatGPT
|
|
135 |
[Evol-Instruct](https://github.com/nlpxucan/evol-instruct) is a novel method using LLMs instead of humans to automatically mass-produce open-domain instructions of various difficulty levels and skills range, to improve the performance of LLMs.
|
136 |
|
137 |
<p align="center" width="100%">
|
138 |
-
<a ><img src="imgs/git_overall.png" alt="WizardLM" style="width: 86%; min-width: 300px; display: block; margin: auto;"></a>
|
139 |
</p>
|
140 |
|
141 |
<p align="center" width="100%">
|
142 |
-
<a ><img src="imgs/git_running.png" alt="WizardLM" style="width: 86%; min-width: 300px; display: block; margin: auto;"></a>
|
143 |
</p>
|
144 |
|
145 |
## Contents
|
@@ -254,12 +254,12 @@ To evaluate Wizard, we conduct human evaluation on the inputs from our human ins
|
|
254 |
|
255 |
WizardLM achieved significantly better results than Alpaca and Vicuna-7b.
|
256 |
<p align="center" width="60%">
|
257 |
-
<a ><img src="imgs/win.png" alt="WizardLM" style="width: 60%; min-width: 300px; display: block; margin: auto;"></a>
|
258 |
</p>
|
259 |
|
260 |
In the high-difficulty section of our test set (difficulty level >= 8), WizardLM even outperforms ChatGPT, with a win rate 7.9% larger than Chatgpt (42.9% vs. 35.0%). This indicates that our method can significantly improve the ability of large language models to handle complex instructions.
|
261 |
<p align="center" width="60%">
|
262 |
-
<a ><img src="imgs/windiff.png" alt="WizardLM" style="width: 60%; min-width: 300px; display: block; margin: auto;"></a>
|
263 |
</p>
|
264 |
|
265 |
### Citation
|
|
|
53 |
I use the following command line; adjust for your tastes and needs:
|
54 |
|
55 |
```
|
56 |
+
./main -t 12 -m WizardLM-13B-1.0.ggmlv3.q5_0.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "Below is an instruction that describes a task. Write a response that appropriately completes the request.
|
57 |
### Instruction:
|
58 |
Write a story about llamas
|
59 |
### Response:"
|
|
|
83 |
Empowering Large Pre-Trained Language Models to Follow Complex Instructions
|
84 |
|
85 |
<p align="center" width="100%">
|
86 |
+
<a ><img src="https://raw.githubusercontent.com/nlpxucan/WizardLM/main/imgs/WizardLM.png" alt="WizardLM" style="width: 20%; min-width: 300px; display: block; margin: auto;"></a>
|
87 |
</p>
|
88 |
|
89 |
[![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg)](https://github.com/tatsu-lab/stanford_alpaca/blob/main/LICENSE)
|
|
|
108 |
|
109 |
We adopt the automatic evaluation framework based on GPT-4 proposed by FastChat to assess the performance of chatbot models. As shown in the following figure, WizardLM-13B achieved better results than Vicuna-13b.
|
110 |
<p align="center" width="100%">
|
111 |
+
<a ><img src="https://raw.githubusercontent.com/nlpxucan/WizardLM/main/imgs/WizarLM13b-GPT4.png" alt="WizardLM" style="width: 100%; min-width: 300px; display: block; margin: auto;"></a>
|
112 |
</p>
|
113 |
|
114 |
### WizardLM-13B performance on different skills.
|
|
|
116 |
The following figure compares WizardLM-13B and ChatGPT’s skill on Evol-Instruct testset. The result indicates that WizardLM-13B achieves 89.1% of ChatGPT’s performance on average, with almost 100% (or more than) capacity on 10 skills, and more than 90% capacity on 22 skills.
|
117 |
|
118 |
<p align="center" width="100%">
|
119 |
+
<a ><img src="https://raw.githubusercontent.com/nlpxucan/WizardLM/main/imgs/evol-testset_skills-13b.png" alt="WizardLM" style="width: 100%; min-width: 300px; display: block; margin: auto;"></a>
|
120 |
</p>
|
121 |
|
122 |
## Call for Feedbacks
|
|
|
135 |
[Evol-Instruct](https://github.com/nlpxucan/evol-instruct) is a novel method using LLMs instead of humans to automatically mass-produce open-domain instructions of various difficulty levels and skills range, to improve the performance of LLMs.
|
136 |
|
137 |
<p align="center" width="100%">
|
138 |
+
<a ><img src="https://raw.githubusercontent.com/nlpxucan/WizardLM/main/imgs/git_overall.png" alt="WizardLM" style="width: 86%; min-width: 300px; display: block; margin: auto;"></a>
|
139 |
</p>
|
140 |
|
141 |
<p align="center" width="100%">
|
142 |
+
<a ><img src="https://raw.githubusercontent.com/nlpxucan/WizardLM/main/imgs/git_running.png" alt="WizardLM" style="width: 86%; min-width: 300px; display: block; margin: auto;"></a>
|
143 |
</p>
|
144 |
|
145 |
## Contents
|
|
|
254 |
|
255 |
WizardLM achieved significantly better results than Alpaca and Vicuna-7b.
|
256 |
<p align="center" width="60%">
|
257 |
+
<a ><img src="https://raw.githubusercontent.com/nlpxucan/WizardLM/main/imgs/win.png" alt="WizardLM" style="width: 60%; min-width: 300px; display: block; margin: auto;"></a>
|
258 |
</p>
|
259 |
|
260 |
In the high-difficulty section of our test set (difficulty level >= 8), WizardLM even outperforms ChatGPT, with a win rate 7.9% larger than Chatgpt (42.9% vs. 35.0%). This indicates that our method can significantly improve the ability of large language models to handle complex instructions.
|
261 |
<p align="center" width="60%">
|
262 |
+
<a ><img src="https://raw.githubusercontent.com/nlpxucan/WizardLM/main/imgs/windiff.png" alt="WizardLM" style="width: 60%; min-width: 300px; display: block; margin: auto;"></a>
|
263 |
</p>
|
264 |
|
265 |
### Citation
|