TheBloke commited on
Commit
7ff9bf2
1 Parent(s): 87017ae

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -1
README.md CHANGED
@@ -1,3 +1,90 @@
1
  ---
2
- license: other
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - causal-lm
6
+ - llama
7
  ---
8
+ # Wizard-Vicuna-13B-HF
9
+
10
+ This is a float16 HF format repo for [junelee's wizard-vicuna 13B](https://huggingface.co/junelee/wizard-vicuna-13b).
11
+
12
+ June Lee's repo was also HF format. The reason I've made this is that the original repo was in float32, meaning it required 52GB disk space, VRAM and RAM.
13
+
14
+ This model was converted to float16 to make it easier to load and manage.
15
+
16
+ ## Repositories available
17
+
18
+ * [4bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/wizard-vicuna-13B-GPTQ).
19
+ * [4bit and 5bit GGML models for CPU inference](https://huggingface.co/TheBloke/wizard-vicuna-13B-GGML).
20
+ * [float16 HF format model for GPU inference](https://huggingface.co/TheBloke/wizard-vicuna-13B-HF).
21
+
22
+ # Original WizardVicuna-13B model card
23
+
24
+ Github page: https://github.com/melodysdreamj/WizardVicunaLM
25
+
26
+ # WizardVicunaLM
27
+ ### Wizard's dataset + ChatGPT's conversation extension + Vicuna's tuning method
28
+ I am a big fan of the ideas behind WizardLM and VicunaLM. I particularly like the idea of WizardLM handling the dataset itself more deeply and broadly, as well as VicunaLM overcoming the limitations of single-turn conversations by introducing multi-round conversations. As a result, I combined these two ideas to create WizardVicunaLM. This project is highly experimental and designed for proof of concept, not for actual usage.
29
+
30
+
31
+ ## Benchmark
32
+ ### Approximately 7% performance improvement over VicunaLM
33
+ ![](https://user-images.githubusercontent.com/21379657/236088663-3fa212c9-0112-4d44-9b01-f16ea093cb67.png)
34
+
35
+
36
+ ### Detail
37
+
38
+ The questions presented here are not from rigorous tests, but rather, I asked a few questions and requested GPT-4 to score them. The models compared were ChatGPT 3.5, WizardVicunaLM, VicunaLM, and WizardLM, in that order.
39
+
40
+ | | gpt3.5 | wizard-vicuna-13b | vicuna-13b | wizard-7b | link |
41
+ |-----|--------|-------------------|------------|-----------|----------|
42
+ | Q1 | 95 | 90 | 85 | 88 | [link](https://sharegpt.com/c/YdhIlby) |
43
+ | Q2 | 95 | 97 | 90 | 89 | [link](https://sharegpt.com/c/YOqOV4g) |
44
+ | Q3 | 85 | 90 | 80 | 65 | [link](https://sharegpt.com/c/uDmrcL9) |
45
+ | Q4 | 90 | 85 | 80 | 75 | [link](https://sharegpt.com/c/XBbK5MZ) |
46
+ | Q5 | 90 | 85 | 80 | 75 | [link](https://sharegpt.com/c/AQ5tgQX) |
47
+ | Q6 | 92 | 85 | 87 | 88 | [link](https://sharegpt.com/c/eVYwfIr) |
48
+ | Q7 | 95 | 90 | 85 | 92 | [link](https://sharegpt.com/c/Kqyeub4) |
49
+ | Q8 | 90 | 85 | 75 | 70 | [link](https://sharegpt.com/c/M0gIjMF) |
50
+ | Q9 | 92 | 85 | 70 | 60 | [link](https://sharegpt.com/c/fOvMtQt) |
51
+ | Q10 | 90 | 80 | 75 | 85 | [link](https://sharegpt.com/c/YYiCaUz) |
52
+ | Q11 | 90 | 85 | 75 | 65 | [link](https://sharegpt.com/c/HMkKKGU) |
53
+ | Q12 | 85 | 90 | 80 | 88 | [link](https://sharegpt.com/c/XbW6jgB) |
54
+ | Q13 | 90 | 95 | 88 | 85 | [link](https://sharegpt.com/c/JXZb7y6) |
55
+ | Q14 | 94 | 89 | 90 | 91 | [link](https://sharegpt.com/c/cTXH4IS) |
56
+ | Q15 | 90 | 85 | 88 | 87 | [link](https://sharegpt.com/c/GZiM0Yt) |
57
+ | | 91 | 88 | 82 | 80 | |
58
+
59
+
60
+ ## Principle
61
+
62
+ We adopted the approach of WizardLM, which is to extend a single problem more in-depth. However, instead of using individual instructions, we expanded it using Vicuna's conversation format and applied Vicuna's fine-tuning techniques.
63
+
64
+ Turning a single command into a rich conversation is what we've done [here](https://sharegpt.com/c/6cmxqq0).
65
+
66
+ After creating the training data, I later trained it according to the Vicuna v1.1 [training method](https://github.com/lm-sys/FastChat/blob/main/scripts/train_vicuna_13b.sh).
67
+
68
+
69
+ ## Detailed Method
70
+
71
+ First, we explore and expand various areas in the same topic using the 7K conversations created by WizardLM. However, we made it in a continuous conversation format instead of the instruction format. That is, it starts with WizardLM's instruction, and then expands into various areas in one conversation using ChatGPT 3.5.
72
+
73
+ After that, we applied the following model using Vicuna's fine-tuning format.
74
+
75
+ ## Training Process
76
+
77
+ Trained with 8 A100 GPUs for 35 hours.
78
+
79
+ ## Weights
80
+ You can see the [dataset](https://huggingface.co/datasets/junelee/wizard_vicuna_70k) we used for training and the [13b model](https://huggingface.co/junelee/wizard-vicuna-13b) in the huggingface.
81
+
82
+ ## Conclusion
83
+ If we extend the conversation to gpt4 32K, we can expect a dramatic improvement, as we can generate 8x more, more accurate and richer conversations.
84
+
85
+ ## License
86
+ The model is licensed under the LLaMA model, and the dataset is licensed under the terms of OpenAI because it uses ChatGPT. Everything else is free.
87
+
88
+ ## Author
89
+
90
+ [JUNE LEE](https://github.com/melodysdreamj) - He is active in Songdo Artificial Intelligence Study and GDG Songdo.