macadeliccc's picture
Update README.md
36f9763 verified
metadata
language:
  - en
license: apache-2.0
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - mistral
  - trl
  - sft
base_model: alpindale/Mistral-7B-v0.2

Mistral-7B-v0.2-OpenHermes

image/webp

SFT Training Params:

  • Learning Rate: 2e-4
  • Batch Size: 8
  • Gradient Accumulation steps: 4
  • Dataset: teknium/OpenHermes-2.5 (200k split contains a slight bias towards rp and theory of life)
  • r: 16
  • Lora Alpha: 16

Training Time: 13 hours on A100

This model is proficient in RAG use cases

RAG Finetuning for your case would be a good idea

Prompt Template: ChatML

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
What's the capital of France?<|im_end|>
<|im_start|>assistant
Paris.

Run easily with ollama

ollama run macadeliccc/mistral-7b-v2-openhermes

OpenAI compatible server with vLLM

install instructions for vllm can be found here

python -m vllm.entrypoints.openai.api_server \
--model macadeliccc/Mistral-7B-v0.2-OpenHermes \ 
--gpu-memory-utilization 0.9 \ # can go as low as 0.83-0.85 if you need a little more gpu for your application
--max-model-len 16000 # 32000 if you can run it. This works on 4090
--chat-template ./examples/template_chatml.jinja

Gradio chatbot interface for your endpoint

import gradio as gr
from openai import OpenAI

# Modify these variables as needed
openai_api_key = "EMPTY"  # Assuming no API key is required for local testing
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)
system_message = "You are a helpful assistant"

def fast_echo(message, history):
    # Send the user's message to the vLLM API and get the response immediately
   
    chat_response = client.chat.completions.create(
        model="macadeliccc/Mistral-7B-v0.2-OpenHermes",
        messages=[
            {"role": "system", "content": system_message},
            {"role": "user", "content": message},
        ]
    )
    print(chat_response)
    return chat_response.choices[0].message.content

demo = gr.ChatInterface(fn=fast_echo, examples=["Write me a quicksort algorithm in python."]).queue()

if __name__ == "__main__":
    demo.launch()

Quantizations

GGUF

AWQ

HQQ-4bit

ExLlamaV2

Evaluations

Thanks to Maxime Labonne for the evalution:

Model AGIEval GPT4All TruthfulQA Bigbench Average
Mistral-7B-v0.2-OpenHermes 35.57 67.15 42.06 36.27 45.26

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 24.02 ± 2.69
acc_norm 21.65 ± 2.59
agieval_logiqa_en 0 acc 28.11 ± 1.76
acc_norm 34.56 ± 1.87
agieval_lsat_ar 0 acc 27.83 ± 2.96
acc_norm 23.48 ± 2.80
agieval_lsat_lr 0 acc 33.73 ± 2.10
acc_norm 33.14 ± 2.09
agieval_lsat_rc 0 acc 48.70 ± 3.05
acc_norm 39.78 ± 2.99
agieval_sat_en 0 acc 67.48 ± 3.27
acc_norm 64.56 ± 3.34
agieval_sat_en_without_passage 0 acc 38.83 ± 3.40
acc_norm 37.38 ± 3.38
agieval_sat_math 0 acc 32.27 ± 3.16
acc_norm 30.00 ± 3.10

Average: 35.57%

GPT4All

Task Version Metric Value Stderr
arc_challenge 0 acc 45.05 ± 1.45
acc_norm 48.46 ± 1.46
arc_easy 0 acc 77.27 ± 0.86
acc_norm 73.78 ± 0.90
boolq 1 acc 68.62 ± 0.81
hellaswag 0 acc 59.63 ± 0.49
acc_norm 79.66 ± 0.40
openbookqa 0 acc 31.40 ± 2.08
acc_norm 43.40 ± 2.22
piqa 0 acc 80.25 ± 0.93
acc_norm 82.05 ± 0.90
winogrande 0 acc 74.11 ± 1.23

Average: 67.15%

TruthfulQA

Task Version Metric Value Stderr
truthfulqa_mc 1 mc1 27.54 ± 1.56
mc2 42.06 ± 1.44

Average: 42.06%

Bigbench

Task Version Metric Value Stderr
bigbench_causal_judgement 0 multiple_choice_grade 56.32 ± 3.61
bigbench_date_understanding 0 multiple_choice_grade 66.40 ± 2.46
bigbench_disambiguation_qa 0 multiple_choice_grade 45.74 ± 3.11
bigbench_geometric_shapes 0 multiple_choice_grade 10.58 ± 1.63
exact_str_match 0.00 ± 0.00
bigbench_logical_deduction_five_objects 0 multiple_choice_grade 25.00 ± 1.94
bigbench_logical_deduction_seven_objects 0 multiple_choice_grade 17.71 ± 1.44
bigbench_logical_deduction_three_objects 0 multiple_choice_grade 37.33 ± 2.80
bigbench_movie_recommendation 0 multiple_choice_grade 29.40 ± 2.04
bigbench_navigate 0 multiple_choice_grade 50.00 ± 1.58
bigbench_reasoning_about_colored_objects 0 multiple_choice_grade 42.50 ± 1.11
bigbench_ruin_names 0 multiple_choice_grade 39.06 ± 2.31
bigbench_salient_translation_error_detection 0 multiple_choice_grade 12.93 ± 1.06
bigbench_snarks 0 multiple_choice_grade 69.06 ± 3.45
bigbench_sports_understanding 0 multiple_choice_grade 49.80 ± 1.59
bigbench_temporal_sequences 0 multiple_choice_grade 26.50 ± 1.40
bigbench_tracking_shuffled_objects_five_objects 0 multiple_choice_grade 21.20 ± 1.16
bigbench_tracking_shuffled_objects_seven_objects 0 multiple_choice_grade 16.06 ± 0.88
bigbench_tracking_shuffled_objects_three_objects 0 multiple_choice_grade 37.33 ± 2.80

Average: 36.27%

Average score: 45.26%

Elapsed time: 01:49:22

  • Developed by: macadeliccc
  • License: apache-2.0
  • Finetuned from model : alpindale/Mistral-7B-v0.2

This mistral model was trained 2x faster with Unsloth and Huggingface's TRL library.