Update README.md
Browse files
README.md
CHANGED
@@ -22,7 +22,7 @@ datasets:
|
|
22 |
|
23 |
# NeuralHermes 2.5 - Mistral 7B - LASER
|
24 |
|
25 |
-
This an experimental LASER version of NeuralHermes using [laserRMT](https://
|
26 |
|
27 |
| Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
|
28 |
|------------------------------------------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
|
@@ -37,38 +37,82 @@ It is directly inspired by the RLHF process described by [Intel/neural-chat-7b-v
|
|
37 |
|
38 |
The code to train this model is available on [Google Colab](https://colab.research.google.com/drive/15iFBr1xWgztXvhrj5I9fBv20c7CFOPBE?usp=sharing) and [GitHub](https://github.com/mlabonne/llm-course/tree/main). It required an A100 GPU for about an hour.
|
39 |
|
40 |
-
### Quantized models
|
41 |
-
|
42 |
-
* GGUF: https://huggingface.co/TheBloke/NeuralHermes-2.5-Mistral-7B-GGUF
|
43 |
-
* AWQ: https://huggingface.co/TheBloke/NeuralHermes-2.5-Mistral-7B-AWQ
|
44 |
-
* GPTQ: https://huggingface.co/TheBloke/NeuralHermes-2.5-Mistral-7B-GPTQ
|
45 |
-
* EXL2:
|
46 |
-
* 3.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-3.0bpw-h6-exl2
|
47 |
-
* 4.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-4.0bpw-h6-exl2
|
48 |
-
* 5.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-5.0bpw-h6-exl2
|
49 |
-
* 6.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-6.0bpw-h6-exl2
|
50 |
-
* 8.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-8.0bpw-h8-exl2
|
51 |
-
|
52 |
## Results
|
53 |
|
54 |
-
**Update:** NeuralHermes-2.5 became the best Hermes-based model on the Open LLM leaderboard and one of the very best 7b models. 🎉
|
55 |
-
|
56 |
-
![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/yWe6VBFxkHiuOlDVBXtGo.png)
|
57 |
-
|
58 |
-
Teknium (author of OpenHermes-2.5-Mistral-7B) benchmarked the model ([see his tweet](https://twitter.com/Teknium1/status/1729955709377503660)).
|
59 |
-
|
60 |
-
Results are improved on every benchmark: **AGIEval** (from 43.07% to 43.62%), **GPT4All** (from 73.12% to 73.25%), and **TruthfulQA**.
|
61 |
-
|
62 |
### AGIEval
|
63 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
64 |
|
65 |
### GPT4All
|
66 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
67 |
|
68 |
### TruthfulQA
|
69 |
-
|
70 |
-
|
71 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
72 |
|
73 |
## Usage
|
74 |
|
@@ -91,7 +135,7 @@ prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, toke
|
|
91 |
# Create pipeline
|
92 |
pipeline = transformers.pipeline(
|
93 |
"text-generation",
|
94 |
-
model=
|
95 |
tokenizer=tokenizer
|
96 |
)
|
97 |
|
@@ -105,30 +149,4 @@ sequences = pipeline(
|
|
105 |
max_length=200,
|
106 |
)
|
107 |
print(sequences[0]['generated_text'])
|
108 |
-
```
|
109 |
-
|
110 |
-
|
111 |
-
## Training hyperparameters
|
112 |
-
|
113 |
-
**LoRA**:
|
114 |
-
* r=16
|
115 |
-
* lora_alpha=16
|
116 |
-
* lora_dropout=0.05
|
117 |
-
* bias="none"
|
118 |
-
* task_type="CAUSAL_LM"
|
119 |
-
* target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
|
120 |
-
|
121 |
-
**Training arguments**:
|
122 |
-
* per_device_train_batch_size=4
|
123 |
-
* gradient_accumulation_steps=4
|
124 |
-
* gradient_checkpointing=True
|
125 |
-
* learning_rate=5e-5
|
126 |
-
* lr_scheduler_type="cosine"
|
127 |
-
* max_steps=200
|
128 |
-
* optim="paged_adamw_32bit"
|
129 |
-
* warmup_steps=100
|
130 |
-
|
131 |
-
**DPOTrainer**:
|
132 |
-
* beta=0.1
|
133 |
-
* max_prompt_length=1024
|
134 |
-
* max_length=1536
|
|
|
22 |
|
23 |
# NeuralHermes 2.5 - Mistral 7B - LASER
|
24 |
|
25 |
+
This is an experimental LASER version of NeuralHermes using [laserRMT](https://i.imgur.com/gUlEJuU.jpg).
|
26 |
|
27 |
| Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
|
28 |
|------------------------------------------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
|
|
|
37 |
|
38 |
The code to train this model is available on [Google Colab](https://colab.research.google.com/drive/15iFBr1xWgztXvhrj5I9fBv20c7CFOPBE?usp=sharing) and [GitHub](https://github.com/mlabonne/llm-course/tree/main). It required an A100 GPU for about an hour.
|
39 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
## Results
|
41 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
42 |
### AGIEval
|
43 |
+
| Task |Version| Metric |Value| |Stderr|
|
44 |
+
|------------------------------|------:|--------|----:|---|-----:|
|
45 |
+
|agieval_aqua_rat | 0|acc |21.26|± | 2.57|
|
46 |
+
| | |acc_norm|22.83|± | 2.64|
|
47 |
+
|agieval_logiqa_en | 0|acc |39.32|± | 1.92|
|
48 |
+
| | |acc_norm|40.71|± | 1.93|
|
49 |
+
|agieval_lsat_ar | 0|acc |25.65|± | 2.89|
|
50 |
+
| | |acc_norm|25.65|± | 2.89|
|
51 |
+
|agieval_lsat_lr | 0|acc |48.82|± | 2.22|
|
52 |
+
| | |acc_norm|50.00|± | 2.22|
|
53 |
+
|agieval_lsat_rc | 0|acc |58.36|± | 3.01|
|
54 |
+
| | |acc_norm|57.25|± | 3.02|
|
55 |
+
|agieval_sat_en | 0|acc |74.27|± | 3.05|
|
56 |
+
| | |acc_norm|73.30|± | 3.09|
|
57 |
+
|agieval_sat_en_without_passage| 0|acc |43.69|± | 3.46|
|
58 |
+
| | |acc_norm|42.23|± | 3.45|
|
59 |
+
|agieval_sat_math | 0|acc |37.27|± | 3.27|
|
60 |
+
| | |acc_norm|36.36|± | 3.25|
|
61 |
+
|
62 |
+
Average: 43.54%
|
63 |
|
64 |
### GPT4All
|
65 |
+
| Task |Version| Metric |Value| |Stderr|
|
66 |
+
|-------------|------:|--------|----:|---|-----:|
|
67 |
+
|arc_challenge| 0|acc |57.76|± | 1.44|
|
68 |
+
| | |acc_norm|60.32|± | 1.43|
|
69 |
+
|arc_easy | 0|acc |83.84|± | 0.76|
|
70 |
+
| | |acc_norm|81.10|± | 0.80|
|
71 |
+
|boolq | 1|acc |86.70|± | 0.59|
|
72 |
+
|hellaswag | 0|acc |63.15|± | 0.48|
|
73 |
+
| | |acc_norm|82.55|± | 0.38|
|
74 |
+
|openbookqa | 0|acc |34.40|± | 2.13|
|
75 |
+
| | |acc_norm|45.20|± | 2.23|
|
76 |
+
|piqa | 0|acc |81.94|± | 0.90|
|
77 |
+
| | |acc_norm|82.97|± | 0.88|
|
78 |
+
|winogrande | 0|acc |75.22|± | 1.21|
|
79 |
+
|
80 |
+
Average: 73.44%
|
81 |
|
82 |
### TruthfulQA
|
83 |
+
| Task |Version|Metric|Value| |Stderr|
|
84 |
+
|-------------|------:|------|----:|---|-----:|
|
85 |
+
|truthfulqa_mc| 1|mc1 |37.70|± | 1.70|
|
86 |
+
| | |mc2 |55.26|± | 1.52|
|
87 |
+
|
88 |
+
Average: 55.26%
|
89 |
+
|
90 |
+
### Bigbench
|
91 |
+
| Task |Version| Metric |Value| |Stderr|
|
92 |
+
|------------------------------------------------|------:|---------------------|----:|---|-----:|
|
93 |
+
|bigbench_causal_judgement | 0|multiple_choice_grade|53.16|± | 3.63|
|
94 |
+
|bigbench_date_understanding | 0|multiple_choice_grade|65.31|± | 2.48|
|
95 |
+
|bigbench_disambiguation_qa | 0|multiple_choice_grade|34.11|± | 2.96|
|
96 |
+
|bigbench_geometric_shapes | 0|multiple_choice_grade|27.02|± | 2.35|
|
97 |
+
| | |exact_str_match | 0.28|± | 0.28|
|
98 |
+
|bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|27.80|± | 2.01|
|
99 |
+
|bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|19.86|± | 1.51|
|
100 |
+
|bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|48.33|± | 2.89|
|
101 |
+
|bigbench_movie_recommendation | 0|multiple_choice_grade|41.40|± | 2.20|
|
102 |
+
|bigbench_navigate | 0|multiple_choice_grade|50.00|± | 1.58|
|
103 |
+
|bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|65.00|± | 1.07|
|
104 |
+
|bigbench_ruin_names | 0|multiple_choice_grade|46.21|± | 2.36|
|
105 |
+
|bigbench_salient_translation_error_detection | 0|multiple_choice_grade|27.25|± | 1.41|
|
106 |
+
|bigbench_snarks | 0|multiple_choice_grade|70.72|± | 3.39|
|
107 |
+
|bigbench_sports_understanding | 0|multiple_choice_grade|65.72|± | 1.51|
|
108 |
+
|bigbench_temporal_sequences | 0|multiple_choice_grade|30.40|± | 1.46|
|
109 |
+
|bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|22.56|± | 1.18|
|
110 |
+
|bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|17.09|± | 0.90|
|
111 |
+
|bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|48.33|± | 2.89|
|
112 |
+
|
113 |
+
Average: 42.24%
|
114 |
+
|
115 |
+
Average score: 53.62%
|
116 |
|
117 |
## Usage
|
118 |
|
|
|
135 |
# Create pipeline
|
136 |
pipeline = transformers.pipeline(
|
137 |
"text-generation",
|
138 |
+
model="mlabonne/NeuralHermes-2.5-Mistral-7B-laser",
|
139 |
tokenizer=tokenizer
|
140 |
)
|
141 |
|
|
|
149 |
max_length=200,
|
150 |
)
|
151 |
print(sequences[0]['generated_text'])
|
152 |
+
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|