metadata
language:
- en
license: apache-2.0
tags:
- merge
base_model:
- mistralai/Mistral-7B-Instruct-v0.2
- ehartford/dolphin-2.2.1-mistral-7b
- SciPhi/SciPhi-Mistral-7B-32k
- ehartford/samantha-1.2-mistral-7b
- Arc53/docsgpt-7b-mistral
- HuggingFaceH4/zephyr-7b-beta
- meta-math/MetaMath-Mistral-7B
- Open-Orca/Mistral-7B-OpenOrca
- openchat/openchat-3.5-1210
- beowolx/MistralHermes-CodePro-7B-v1
- TIGER-Lab/MAmmoTH-7B-Mistral
- teknium/OpenHermes-2.5-Mistral-7B
- Weyaxi/OpenHermes-2.5-neural-chat-v3-3-Slerp
- mlabonne/NeuralHermes-2.5-Mistral-7B
model-index:
- name: Mistral-7B-Merge-14-v0.3
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: AI2 Reasoning Challenge (25-Shot)
type: ai2_arc
config: ARC-Challenge
split: test
args:
num_few_shot: 25
metrics:
- type: acc_norm
value: 65.96
name: normalized accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=EmbeddedLLM/Mistral-7B-Merge-14-v0.3
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: HellaSwag (10-Shot)
type: hellaswag
split: validation
args:
num_few_shot: 10
metrics:
- type: acc_norm
value: 85.29
name: normalized accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=EmbeddedLLM/Mistral-7B-Merge-14-v0.3
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU (5-Shot)
type: cais/mmlu
config: all
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 64.35
name: accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=EmbeddedLLM/Mistral-7B-Merge-14-v0.3
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: TruthfulQA (0-shot)
type: truthful_qa
config: multiple_choice
split: validation
args:
num_few_shot: 0
metrics:
- type: mc2
value: 57.8
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=EmbeddedLLM/Mistral-7B-Merge-14-v0.3
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: Winogrande (5-shot)
type: winogrande
config: winogrande_xl
split: validation
args:
num_few_shot: 5
metrics:
- type: acc
value: 78.3
name: accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=EmbeddedLLM/Mistral-7B-Merge-14-v0.3
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GSM8k (5-shot)
type: gsm8k
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 66.26
name: accuracy
source:
url: >-
https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=EmbeddedLLM/Mistral-7B-Merge-14-v0.3
name: Open LLM Leaderboard
Update 2024-01-03
Check out our v0.4 model which is based on this and achieves better average score of 71.19 versus 69.66.
Model Description
This is an update to EmbeddedLLM/Mistral-7B-Merge-14-v0.2 that removes potentially TruthfulQA-contaminated models and non-commercially licensed models:
- berkeley-nest/Starling-LM-7B-alpha
- Q-bert/MetaMath-Cybertron-Starling
- v1olet/v1olet_marcoroni-go-bruins-merge-7B
This is an experiment to test merging 14 models using DARE TIES 🦙
The result is a base model that performs quite well but may need some further chat fine-tuning.
The 14 models are as follows:
- mistralai/Mistral-7B-Instruct-v0.2
- ehartford/dolphin-2.2.1-mistral-7b
- SciPhi/SciPhi-Mistral-7B-32k
- ehartford/samantha-1.2-mistral-7b
- Arc53/docsgpt-7b-mistral
- HuggingFaceH4/zephyr-7b-beta
- meta-math/MetaMath-Mistral-7B
- Open-Orca/Mistral-7B-OpenOrca
- openchat/openchat-3.5-1210
- beowolx/MistralHermes-CodePro-7B-v1
- TIGER-Lab/MAmmoTH-7B-Mistral
- teknium/OpenHermes-2.5-Mistral-7B
- Weyaxi/OpenHermes-2.5-neural-chat-v3-3-Slerp
- mlabonne/NeuralHermes-2.5-Mistral-7B
- base model: mistralai/Mistral-7B-v0.1
Open LLM Leaderboard
v0.3 | v0.4 | |
---|---|---|
Average | 69.66 | 71.19 |
ARC | 65.96 | 66.81 |
HellaSwag | 85.29 | 86.15 |
MMLU | 64.35 | 65.10 |
TruthfulQA | 57.80 | 58.25 |
Winogrande | 78.30 | 80.03 |
GSM8K | 66.26 | 70.81 |
Chat Template
We tried ChatML and Llama-2 chat template, but feel free to try other templates.
Merge Configuration
The merge config file for this model is here:
models:
- model: mistralai/Mistral-7B-v0.1
# no parameters necessary for base model
- model: ehartford/dolphin-2.2.1-mistral-7b
parameters:
weight: 0.08
density: 0.4
- model: SciPhi/SciPhi-Mistral-7B-32k
parameters:
weight: 0.08
density: 0.4
- model: ehartford/samantha-1.2-mistral-7b
parameters:
weight: 0.08
density: 0.4
- model: Arc53/docsgpt-7b-mistral
parameters:
weight: 0.08
density: 0.4
- model: HuggingFaceH4/zephyr-7b-beta
parameters:
weight: 0.08
density: 0.4
- model: meta-math/MetaMath-Mistral-7B
parameters:
weight: 0.08
density: 0.4
- model: Open-Orca/Mistral-7B-OpenOrca
parameters:
weight: 0.08
density: 0.4
- model: openchat/openchat-3.5-1210
parameters:
weight: 0.08
density: 0.4
- model: beowolx/MistralHermes-CodePro-7B-v1
parameters:
weight: 0.08
density: 0.4
- model: TIGER-Lab/MAmmoTH-7B-Mistral
parameters:
weight: 0.08
density: 0.4
- model: teknium/OpenHermes-2.5-Mistral-7B
parameters:
weight: 0.08
density: 0.4
- model: Weyaxi/OpenHermes-2.5-neural-chat-v3-3-Slerp
parameters:
weight: 0.08
density: 0.4
- model: mlabonne/NeuralHermes-2.5-Mistral-7B
parameters:
weight: 0.08
density: 0.4
- model: mistralai/Mistral-7B-Instruct-v0.2
parameters:
weight: 0.08
density: 0.5
merge_method: dare_ties
base_model: mistralai/Mistral-7B-v0.1
parameters:
int8_mask: true
dtype: bfloat16
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 69.66 |
AI2 Reasoning Challenge (25-Shot) | 65.96 |
HellaSwag (10-Shot) | 85.29 |
MMLU (5-Shot) | 64.35 |
TruthfulQA (0-shot) | 57.80 |
Winogrande (5-shot) | 78.30 |
GSM8k (5-shot) | 66.26 |