mlabonne
/

NeuralHermes-2.5-Mistral-7B-laser

@@ -22,7 +22,7 @@ datasets:
 # NeuralHermes 2.5 - Mistral 7B - LASER
-This an experimental LASER version of NeuralHermes using [laserRMT](https://github.com/cognitivecomputations/laserRMT).
 |                                                Model                                                 |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
 |------------------------------------------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
@@ -37,38 +37,82 @@ It is directly inspired by the RLHF process described by [Intel/neural-chat-7b-v
 The code to train this model is available on [Google Colab](https://colab.research.google.com/drive/15iFBr1xWgztXvhrj5I9fBv20c7CFOPBE?usp=sharing) and [GitHub](https://github.com/mlabonne/llm-course/tree/main). It required an A100 GPU for about an hour.
-### Quantized models
-* GGUF: https://huggingface.co/TheBloke/NeuralHermes-2.5-Mistral-7B-GGUF
-* AWQ: https://huggingface.co/TheBloke/NeuralHermes-2.5-Mistral-7B-AWQ
-* GPTQ: https://huggingface.co/TheBloke/NeuralHermes-2.5-Mistral-7B-GPTQ
-* EXL2:
-  * 3.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-3.0bpw-h6-exl2
-  * 4.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-4.0bpw-h6-exl2
-  * 5.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-5.0bpw-h6-exl2
-  * 6.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-6.0bpw-h6-exl2
-  * 8.0bpw: https://huggingface.co/LoneStriker/NeuralHermes-2.5-Mistral-7B-8.0bpw-h8-exl2
 ## Results
-**Update:** NeuralHermes-2.5 became the best Hermes-based model on the Open LLM leaderboard and one of the very best 7b models. 🎉
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/yWe6VBFxkHiuOlDVBXtGo.png)
-Teknium (author of OpenHermes-2.5-Mistral-7B) benchmarked the model ([see his tweet](https://twitter.com/Teknium1/status/1729955709377503660)).
-Results are improved on every benchmark: **AGIEval** (from 43.07% to 43.62%), **GPT4All** (from 73.12% to 73.25%), and **TruthfulQA**.
 ### AGIEval
-![](https://i.imgur.com/7an3B1f.png)
 ### GPT4All
-![](https://i.imgur.com/TLxZFi9.png)
 ### TruthfulQA
-![](https://i.imgur.com/V380MqD.png)
-You can check the Weights & Biases project [here](https://wandb.ai/mlabonne/NeuralHermes-2-5-Mistral-7B/overview?workspace=user-mlabonne).
 ## Usage
@@ -91,7 +135,7 @@ prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, toke
 # Create pipeline
 pipeline = transformers.pipeline(
     "text-generation",
-    model=new_model,
     tokenizer=tokenizer
 )
@@ -105,30 +149,4 @@ sequences = pipeline(
     max_length=200,
 )
 print(sequences[0]['generated_text'])
-```
-## Training hyperparameters
-**LoRA**:
-* r=16
-* lora_alpha=16
-* lora_dropout=0.05
-* bias="none"
-* task_type="CAUSAL_LM"
-* target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
-**Training arguments**:
-* per_device_train_batch_size=4
-* gradient_accumulation_steps=4
-* gradient_checkpointing=True
-* learning_rate=5e-5
-* lr_scheduler_type="cosine"
-* max_steps=200
-* optim="paged_adamw_32bit"
-* warmup_steps=100
-**DPOTrainer**:
-* beta=0.1
-* max_prompt_length=1024
-* max_length=1536

 # NeuralHermes 2.5 - Mistral 7B - LASER
+This is an experimental LASER version of NeuralHermes using [laserRMT](https://i.imgur.com/gUlEJuU.jpg).
 |                                                Model                                                 |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
 |------------------------------------------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
 The code to train this model is available on [Google Colab](https://colab.research.google.com/drive/15iFBr1xWgztXvhrj5I9fBv20c7CFOPBE?usp=sharing) and [GitHub](https://github.com/mlabonne/llm-course/tree/main). It required an A100 GPU for about an hour.
 ## Results
 ### AGIEval
+|             Task             |Version| Metric |Value|   |Stderr|
+|------------------------------|------:|--------|----:|---|-----:|
+|agieval_aqua_rat              |      0|acc     |21.26|±  |  2.57|
+|                              |       |acc_norm|22.83|±  |  2.64|
+|agieval_logiqa_en             |      0|acc     |39.32|±  |  1.92|
+|                              |       |acc_norm|40.71|±  |  1.93|
+|agieval_lsat_ar               |      0|acc     |25.65|±  |  2.89|
+|                              |       |acc_norm|25.65|±  |  2.89|
+|agieval_lsat_lr               |      0|acc     |48.82|±  |  2.22|
+|                              |       |acc_norm|50.00|±  |  2.22|
+|agieval_lsat_rc               |      0|acc     |58.36|±  |  3.01|
+|                              |       |acc_norm|57.25|±  |  3.02|
+|agieval_sat_en                |      0|acc     |74.27|±  |  3.05|
+|                              |       |acc_norm|73.30|±  |  3.09|
+|agieval_sat_en_without_passage|      0|acc     |43.69|±  |  3.46|
+|                              |       |acc_norm|42.23|±  |  3.45|
+|agieval_sat_math              |      0|acc     |37.27|±  |  3.27|
+|                              |       |acc_norm|36.36|±  |  3.25|
+Average: 43.54%
 ### GPT4All
+|    Task     |Version| Metric |Value|   |Stderr|
+|-------------|------:|--------|----:|---|-----:|
+|arc_challenge|      0|acc     |57.76|±  |  1.44|
+|             |       |acc_norm|60.32|±  |  1.43|
+|arc_easy     |      0|acc     |83.84|±  |  0.76|
+|             |       |acc_norm|81.10|±  |  0.80|
+|boolq        |      1|acc     |86.70|±  |  0.59|
+|hellaswag    |      0|acc     |63.15|±  |  0.48|
+|             |       |acc_norm|82.55|±  |  0.38|
+|openbookqa   |      0|acc     |34.40|±  |  2.13|
+|             |       |acc_norm|45.20|±  |  2.23|
+|piqa         |      0|acc     |81.94|±  |  0.90|
+|             |       |acc_norm|82.97|±  |  0.88|
+|winogrande   |      0|acc     |75.22|±  |  1.21|
+Average: 73.44%
 ### TruthfulQA
+|    Task     |Version|Metric|Value|   |Stderr|
+|-------------|------:|------|----:|---|-----:|
+|truthfulqa_mc|      1|mc1   |37.70|±  |  1.70|
+|             |       |mc2   |55.26|±  |  1.52|
+Average: 55.26%
+### Bigbench
+|                      Task                      |Version|       Metric        |Value|   |Stderr|
+|------------------------------------------------|------:|---------------------|----:|---|-----:|
+|bigbench_causal_judgement                       |      0|multiple_choice_grade|53.16|±  |  3.63|
+|bigbench_date_understanding                     |      0|multiple_choice_grade|65.31|±  |  2.48|
+|bigbench_disambiguation_qa                      |      0|multiple_choice_grade|34.11|±  |  2.96|
+|bigbench_geometric_shapes                       |      0|multiple_choice_grade|27.02|±  |  2.35|
+|                                                |       |exact_str_match      | 0.28|±  |  0.28|
+|bigbench_logical_deduction_five_objects         |      0|multiple_choice_grade|27.80|±  |  2.01|
+|bigbench_logical_deduction_seven_objects        |      0|multiple_choice_grade|19.86|±  |  1.51|
+|bigbench_logical_deduction_three_objects        |      0|multiple_choice_grade|48.33|±  |  2.89|
+|bigbench_movie_recommendation                   |      0|multiple_choice_grade|41.40|±  |  2.20|
+|bigbench_navigate                               |      0|multiple_choice_grade|50.00|±  |  1.58|
+|bigbench_reasoning_about_colored_objects        |      0|multiple_choice_grade|65.00|±  |  1.07|
+|bigbench_ruin_names                             |      0|multiple_choice_grade|46.21|±  |  2.36|
+|bigbench_salient_translation_error_detection    |      0|multiple_choice_grade|27.25|±  |  1.41|
+|bigbench_snarks                                 |      0|multiple_choice_grade|70.72|±  |  3.39|
+|bigbench_sports_understanding                   |      0|multiple_choice_grade|65.72|±  |  1.51|
+|bigbench_temporal_sequences                     |      0|multiple_choice_grade|30.40|±  |  1.46|
+|bigbench_tracking_shuffled_objects_five_objects |      0|multiple_choice_grade|22.56|±  |  1.18|
+|bigbench_tracking_shuffled_objects_seven_objects|      0|multiple_choice_grade|17.09|±  |  0.90|
+|bigbench_tracking_shuffled_objects_three_objects|      0|multiple_choice_grade|48.33|±  |  2.89|
+Average: 42.24%
+Average score: 53.62%
 ## Usage
 # Create pipeline
 pipeline = transformers.pipeline(
     "text-generation",
+    model="mlabonne/NeuralHermes-2.5-Mistral-7B-laser",
     tokenizer=tokenizer
 )
     max_length=200,
 )
 print(sequences[0]['generated_text'])
+```