hfc971 commited on
Commit
4702645
β€’
1 Parent(s): d896b11

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +203 -0
README.md ADDED
@@ -0,0 +1,203 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ tags:
4
+ - merge
5
+ - mergekit
6
+ - lazymergekit
7
+ - dpo
8
+ - rlhf
9
+ base_model: mlabonne/Beagle14-7B
10
+ model-index:
11
+ - name: NeuralBeagle14-7B
12
+ results:
13
+ - task:
14
+ type: text-generation
15
+ name: Text Generation
16
+ dataset:
17
+ name: AI2 Reasoning Challenge (25-Shot)
18
+ type: ai2_arc
19
+ config: ARC-Challenge
20
+ split: test
21
+ args:
22
+ num_few_shot: 25
23
+ metrics:
24
+ - type: acc_norm
25
+ value: 72.95
26
+ name: normalized accuracy
27
+ source:
28
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralBeagle14-7B
29
+ name: Open LLM Leaderboard
30
+ - task:
31
+ type: text-generation
32
+ name: Text Generation
33
+ dataset:
34
+ name: HellaSwag (10-Shot)
35
+ type: hellaswag
36
+ split: validation
37
+ args:
38
+ num_few_shot: 10
39
+ metrics:
40
+ - type: acc_norm
41
+ value: 88.34
42
+ name: normalized accuracy
43
+ source:
44
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralBeagle14-7B
45
+ name: Open LLM Leaderboard
46
+ - task:
47
+ type: text-generation
48
+ name: Text Generation
49
+ dataset:
50
+ name: MMLU (5-Shot)
51
+ type: cais/mmlu
52
+ config: all
53
+ split: test
54
+ args:
55
+ num_few_shot: 5
56
+ metrics:
57
+ - type: acc
58
+ value: 64.55
59
+ name: accuracy
60
+ source:
61
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralBeagle14-7B
62
+ name: Open LLM Leaderboard
63
+ - task:
64
+ type: text-generation
65
+ name: Text Generation
66
+ dataset:
67
+ name: TruthfulQA (0-shot)
68
+ type: truthful_qa
69
+ config: multiple_choice
70
+ split: validation
71
+ args:
72
+ num_few_shot: 0
73
+ metrics:
74
+ - type: mc2
75
+ value: 69.93
76
+ source:
77
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralBeagle14-7B
78
+ name: Open LLM Leaderboard
79
+ - task:
80
+ type: text-generation
81
+ name: Text Generation
82
+ dataset:
83
+ name: Winogrande (5-shot)
84
+ type: winogrande
85
+ config: winogrande_xl
86
+ split: validation
87
+ args:
88
+ num_few_shot: 5
89
+ metrics:
90
+ - type: acc
91
+ value: 82.4
92
+ name: accuracy
93
+ source:
94
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralBeagle14-7B
95
+ name: Open LLM Leaderboard
96
+ - task:
97
+ type: text-generation
98
+ name: Text Generation
99
+ dataset:
100
+ name: GSM8k (5-shot)
101
+ type: gsm8k
102
+ config: main
103
+ split: test
104
+ args:
105
+ num_few_shot: 5
106
+ metrics:
107
+ - type: acc
108
+ value: 70.28
109
+ name: accuracy
110
+ source:
111
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mlabonne/NeuralBeagle14-7B
112
+ name: Open LLM Leaderboard
113
+ ---
114
+
115
+ ![](https://i.imgur.com/89ZAKcn.png)
116
+
117
+ # 🐢 NeuralBeagle14-7B
118
+
119
+ **Update 01/16/24: NeuralBeagle14-7B is (probably) the best 7B model you can find! πŸŽ‰**
120
+
121
+ NeuralBeagle14-7B is a DPO fine-tune of [mlabonne/Beagle14-7B](https://huggingface.co/mlabonne/Beagle14-7B) using the [argilla/distilabel-intel-orca-dpo-pairs](https://huggingface.co/datasets/argilla/distilabel-intel-orca-dpo-pairs) preference dataset and my DPO notebook from [this article](https://towardsdatascience.com/fine-tune-a-mistral-7b-model-with-direct-preference-optimization-708042745aac).
122
+
123
+ It is based on a merge of the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
124
+ * [fblgit/UNA-TheBeagle-7b-v1](https://huggingface.co/fblgit/UNA-TheBeagle-7b-v1), based on jondurbin's [repo](https://github.com/jondurbin/bagel) and [jondurbin/bagel-v0.3](https://huggingface.co/datasets/jondurbin/bagel-v0.3])
125
+ * [argilla/distilabeled-Marcoro14-7B-slerp](https://huggingface.co/argilla/distilabeled-Marcoro14-7B-slerp), based on [mlabonne/Marcoro14-7B-slerp](https://huggingface.co/mlabonne/Marcoro14-7B-slerp)
126
+
127
+ Thanks [Argilla](https://huggingface.co/argilla) for providing the dataset and the training recipe [here](https://huggingface.co/argilla/distilabeled-Marcoro14-7B-slerp). πŸ’ͺ
128
+
129
+ You can try it out in this [Space](https://huggingface.co/spaces/mlabonne/NeuralBeagle14-7B-GGUF-Chat) (GGUF Q4_K_M).
130
+
131
+ ## πŸ” Applications
132
+
133
+ This model uses a context window of 8k. It is compatible with different templates, like chatml and Llama's chat template.
134
+
135
+ Compared to other 7B models, it displays good performance in instruction following and reasoning tasks. It can also be used for RP and storytelling.
136
+
137
+ ## ⚑ Quantized models
138
+
139
+ * **GGUF**: https://huggingface.co/mlabonne/NeuralBeagle14-7B-GGUF
140
+ * **GPTQ**: https://huggingface.co/TheBloke/NeuralBeagle14-7B-GPTQ
141
+ * **AWQ**: https://huggingface.co/TheBloke/NeuralBeagle14-7B-AWQ
142
+ * **EXL2**: https://huggingface.co/LoneStriker/NeuralBeagle14-7B-8.0bpw-h8-exl2
143
+
144
+ ## πŸ† Evaluation
145
+
146
+ ### Open LLM Leaderboard
147
+
148
+ NeuralBeagle14-7B ranks first on the Open LLM Leaderboard in the ~7B category.
149
+
150
+ ![](https://i.imgur.com/4nAzJsr.png)
151
+
152
+ It has the same average score as Beagle14-7B ("Show merges"), which could be due to might be due to an unlucky run.
153
+ I think I might be overexploiting argilla/distilabel-intel-orca-dpo-pairs at this point, since this dataset or its original version are present in multiple models.
154
+ I need to find more high-quality preference data for the next DPO merge.
155
+
156
+ Note that some models like udkai/Turdus and nfaheem/Marcoroni-7b-DPO-Merge are unfortunately contaminated on purpose (see the very high Winogrande score).
157
+
158
+ ### Nous
159
+
160
+ The evaluation was performed using [LLM AutoEval](https://github.com/mlabonne/llm-autoeval) on Nous suite. It is the best 7B model to date.
161
+
162
+ | Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench |
163
+ |---|---:|---:|---:|---:|---:|
164
+ | [**mlabonne/NeuralBeagle14-7B**](https://huggingface.co/mlabonne/NeuralBeagle14-7B) [πŸ“„](https://gist.github.com/mlabonne/ad0c665bbe581c8420136c3b52b3c15c) | **60.25** | **46.06** | **76.77** | **70.32** | **47.86** |
165
+ | [mlabonne/Beagle14-7B](https://huggingface.co/mlabonne/Beagle14-7B) [πŸ“„](https://gist.github.com/mlabonne/f5a5bf8c0827bbec2f05b97cc62d642c) | 59.4 | 44.38 | 76.53 | 69.44 | 47.25 |
166
+ | [mlabonne/NeuralDaredevil-7B](https://huggingface.co/mlabonne/NeuralDaredevil-7B) [πŸ“„](https://gist.github.com/mlabonne/cbeb077d1df71cb81c78f742f19f4155) | 59.39 | 45.23 | 76.2 | 67.61 | 48.52 |
167
+ | [argilla/distilabeled-Marcoro14-7B-slerp](https://huggingface.co/argilla/distilabeled-Marcoro14-7B-slerp) [πŸ“„](https://gist.github.com/mlabonne/9082c4e59f4d3f3543c5eda3f4807040) | 58.93 | 45.38 | 76.48 | 65.68 | 48.18 |
168
+ | [mlabonne/NeuralMarcoro14-7B](https://huggingface.co/mlabonne/NeuralMarcoro14-7B) [πŸ“„](https://gist.github.com/mlabonne/b31572a4711c945a4827e7242cfc4b9d) | 58.4 | 44.59 | 76.17 | 65.94 | 46.9 |
169
+ | [openchat/openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106) [πŸ“„](https://gist.github.com/mlabonne/1afab87b543b0717ec08722cf086dcc3) | 53.71 | 44.17 | 73.72 | 52.53 | 44.4 |
170
+ | [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) [πŸ“„](https://gist.github.com/mlabonne/88b21dd9698ffed75d6163ebdc2f6cc8) | 52.42 | 42.75 | 72.99 | 52.99 | 40.94 |
171
+
172
+ You can find the complete benchmark on [YALL - Yet Another LLM Leaderboard](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard).
173
+
174
+ ## πŸ’» Usage
175
+
176
+ ```python
177
+ !pip install -qU transformers accelerate
178
+
179
+ from transformers import AutoTokenizer
180
+ import transformers
181
+ import torch
182
+
183
+ model = "mlabonne/NeuralBeagle14-7B"
184
+ messages = [{"role": "user", "content": "What is a large language model?"}]
185
+
186
+ tokenizer = AutoTokenizer.from_pretrained(model)
187
+ prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
188
+ pipeline = transformers.pipeline(
189
+ "text-generation",
190
+ model=model,
191
+ torch_dtype=torch.float16,
192
+ device_map="auto",
193
+ )
194
+
195
+ outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
196
+ print(outputs[0]["generated_text"])
197
+ ```
198
+
199
+ <p align="center">
200
+ <a href="https://github.com/argilla-io/distilabel">
201
+ <img src="https://raw.githubusercontent.com/argilla-io/distilabel/main/docs/assets/distilabel-badge-light.png" alt="Built with Distilabel" width="200" height="32"/>
202
+ </a>
203
+ </p>