mlabonne commited on
Commit
76efb2d
·
1 Parent(s): b838826

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +129 -0
README.md ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: teknium/OpenHermes-2.5-Mistral-7B
3
+ tags:
4
+ - mistral
5
+ - instruct
6
+ - finetune
7
+ - chatml
8
+ - gpt4
9
+ - synthetic data
10
+ - distillation
11
+ - dpo
12
+ - rlhf
13
+ - laser
14
+ license: apache-2.0
15
+ language:
16
+ - en
17
+ datasets:
18
+ - mlabonne/chatml_dpo_pairs
19
+ ---
20
+
21
+ <center><img src="https://i.imgur.com/qIhaFNM.png"></center>
22
+
23
+ # NeuralHermes 2.5 - Mistral 7B - LASER
24
+
25
+ This an experimental LASER version of NeuralHermes using [laserRMT](https://github.com/cognitivecomputations/laserRMT).
26
+
27
+ Special thanks to Fernando Fernandes Neto and Eric Hartford. "Optimizing Large Language Models Using Layer-Selective Rank Reduction and Random Matrix Theory." 2024.
28
+
29
+ NeuralHermes is an [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) model that has been further fine-tuned with Direct Preference Optimization (DPO) using the [mlabonne/chatml_dpo_pairs](https://huggingface.co/datasets/mlabonne/chatml_dpo_pairs) dataset. It surpasses the original model on several benchmarks (see results).
30
+
31
+ It is directly inspired by the RLHF process described by [Intel/neural-chat-7b-v3-1](https://huggingface.co/Intel/neural-chat-7b-v3-1)'s authors to improve performance. I used the same dataset and reformatted it to apply the ChatML template.
32
+
33
+ The code to train this model is available on [Google Colab](https://colab.research.google.com/drive/15iFBr1xWgztXvhrj5I9fBv20c7CFOPBE?usp=sharing) and [GitHub](https://github.com/mlabonne/llm-course/tree/main). It required an A100 GPU for about an hour.
34
+
35
+ ### Quantized models
36
+
37
+ * GGUF: https://huggingface.co/TheBloke/NeuralHermes-2.5-Mistral-7B-GGUF
38
+ * AWQ: https://huggingface.co/TheBloke/NeuralHermes-2.5-Mistral-7B-AWQ
39
+ * GPTQ: https://huggingface.co/TheBloke/NeuralHermes-2.5-Mistral-7B-GPTQ
40
+ * EXL2:
41
+ * 3.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-3.0bpw-h6-exl2
42
+ * 4.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-4.0bpw-h6-exl2
43
+ * 5.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-5.0bpw-h6-exl2
44
+ * 6.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-6.0bpw-h6-exl2
45
+ * 8.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-8.0bpw-h8-exl2
46
+
47
+ ## Results
48
+
49
+ **Update:** NeuralHermes-2.5 became the best Hermes-based model on the Open LLM leaderboard and one of the very best 7b models. 🎉
50
+
51
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/yWe6VBFxkHiuOlDVBXtGo.png)
52
+
53
+ Teknium (author of OpenHermes-2.5-Mistral-7B) benchmarked the model ([see his tweet](https://twitter.com/Teknium1/status/1729955709377503660)).
54
+
55
+ Results are improved on every benchmark: **AGIEval** (from 43.07% to 43.62%), **GPT4All** (from 73.12% to 73.25%), and **TruthfulQA**.
56
+
57
+ ### AGIEval
58
+ ![](https://i.imgur.com/7an3B1f.png)
59
+
60
+ ### GPT4All
61
+ ![](https://i.imgur.com/TLxZFi9.png)
62
+
63
+ ### TruthfulQA
64
+ ![](https://i.imgur.com/V380MqD.png)
65
+
66
+ You can check the Weights & Biases project [here](https://wandb.ai/mlabonne/NeuralHermes-2-5-Mistral-7B/overview?workspace=user-mlabonne).
67
+
68
+ ## Usage
69
+
70
+ You can run this model using [LM Studio](https://lmstudio.ai/) or any other frontend.
71
+
72
+ You can also run this model using the following code:
73
+
74
+ ```python
75
+ import transformers
76
+ from transformers import AutoTokenizer
77
+
78
+ # Format prompt
79
+ message = [
80
+ {"role": "system", "content": "You are a helpful assistant chatbot."},
81
+ {"role": "user", "content": "What is a Large Language Model?"}
82
+ ]
83
+ tokenizer = AutoTokenizer.from_pretrained(new_model)
84
+ prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)
85
+
86
+ # Create pipeline
87
+ pipeline = transformers.pipeline(
88
+ "text-generation",
89
+ model=new_model,
90
+ tokenizer=tokenizer
91
+ )
92
+
93
+ # Generate text
94
+ sequences = pipeline(
95
+ prompt,
96
+ do_sample=True,
97
+ temperature=0.7,
98
+ top_p=0.9,
99
+ num_return_sequences=1,
100
+ max_length=200,
101
+ )
102
+ print(sequences[0]['generated_text'])
103
+ ```
104
+
105
+
106
+ ## Training hyperparameters
107
+
108
+ **LoRA**:
109
+ * r=16
110
+ * lora_alpha=16
111
+ * lora_dropout=0.05
112
+ * bias="none"
113
+ * task_type="CAUSAL_LM"
114
+ * target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
115
+
116
+ **Training arguments**:
117
+ * per_device_train_batch_size=4
118
+ * gradient_accumulation_steps=4
119
+ * gradient_checkpointing=True
120
+ * learning_rate=5e-5
121
+ * lr_scheduler_type="cosine"
122
+ * max_steps=200
123
+ * optim="paged_adamw_32bit"
124
+ * warmup_steps=100
125
+
126
+ **DPOTrainer**:
127
+ * beta=0.1
128
+ * max_prompt_length=1024
129
+ * max_length=1536