Update README.md

236b71a verified 8 months ago

9.45 kB

	---
	base_model:
	- mistralai/Mistral-7B-v0.1
	- argilla/distilabeled-OpenHermes-2.5-Mistral-7B
	- NeverSleep/Noromaid-7B-0.4-DPO
	- senseable/WestLake-7B-v2
	- mlabonne/AlphaMonarch-7B
	library_name: transformers
	tags:
	- mergekit
	- merge
	license: cc-by-nc-4.0
	model-index:
	- name: WestLake_Noromaid_OpenHermes_neural-chatv0.1
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: EQ-Bench
	type: eq-bench
	config: EQ-Bench
	split: v2.1
	args:
	num_few_shot: 3
	metrics:
	- type: acc_norm
	value: 77.19
	name: self-reported
	source:
	url: https://github.com/EQ-bench/EQ-Bench
	name: EQ-Bench v2.1
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AI2 Reasoning Challenge (25-Shot)
	type: ai2_arc
	config: ARC-Challenge
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: acc_norm
	value: 70.22
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=giraffe176/WestMaid_HermesMonarchv0.1
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HellaSwag (10-Shot)
	type: hellaswag
	split: validation
	args:
	num_few_shot: 10
	metrics:
	- type: acc_norm
	value: 87.42
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=giraffe176/WestMaid_HermesMonarchv0.1
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU (5-Shot)
	type: cais/mmlu
	config: all
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 64.31
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=giraffe176/WestMaid_HermesMonarchv0.1
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: TruthfulQA (0-shot)
	type: truthful_qa
	config: multiple_choice
	split: validation
	args:
	num_few_shot: 0
	metrics:
	- type: mc2
	value: 61.99
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=giraffe176/WestMaid_HermesMonarchv0.1
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Winogrande (5-shot)
	type: winogrande
	config: winogrande_xl
	split: validation
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 82.16
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=giraffe176/WestMaid_HermesMonarchv0.1
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8k (5-shot)
	type: gsm8k
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 69.6
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=giraffe176/WestMaid_HermesMonarchv0.1
	name: Open LLM Leaderboard
	---
	# WestMaid_HermesMonarchv0.1

	<img src="https://cdn-uploads.huggingface.co/production/uploads/655a9883cbbaec115c3fd6b3/YJTMJZF80hKaKnPDu_yMV.png" alt="drawing" width="800"/>

	This model benchmarks quite well compared to other 7b models, and has exceptional [MT-Bench](https://github.com/lm-sys/FastChat/tree/main/fastchat/llm_judge) and [EQ-Bench v2.1](https://github.com/EQ-bench/EQ-Bench) scores, ranking higher than ChatGPT-3.5-turbo and Claude-1 in both tests, and Goliath-120b, and other 70B models in the latter .

	This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit)

	## Merge Details
	### Merge Method

	This model was merged using the [DARE](https://arxiv.org/abs/2311.03099) [TIES](https://arxiv.org/abs/2306.01708) merge method using [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) as a base.
	Density was chosen deterministically between the models chosen for this merge. After testing many densities, I settled on 0.58 for each of the chosen models as it returned the highest EQ-Bench score. Not much testing was done with the weights, but I thought that I'd try gradients. Conceptually, Westlake and a Distilled version of Open Heremes are heavier in the initial layers (guiding understanding, and thoughts), before Noromaid and AlphaMonarch come in to guide its wants, reasoning, and conversation.



	### Models Merged

	The following models were included in the merge:
	* [mlabonne/AlphaMonarch-7B](https://huggingface.co/mlabonne/AlphaMonarch-7B)
	* [NeverSleep/Noromaid-7B-0.4-DPO](https://huggingface.co/NeverSleep/Noromaid-7B-0.4-DPO)
	* [senseable/WestLake-7B-v2](https://huggingface.co/senseable/WestLake-7B-v2)
	* [argilla/distilabeled-OpenHermes-2.5-Mistral-7B](https://huggingface.co/argilla/distilabeled-OpenHermes-2.5-Mistral-7B)

	### Configuration

	The following YAML configuration was used to produce this model:

	```yaml
	models:
	- model: mistralai/Mistral-7B-v0.1
	# No parameters necessary for base model
	- model: senseable/WestLake-7B-v2
	parameters:
	density: 0.58
	weight: [0.50, 0.40, 0.25, 0.05]
	- model: NeverSleep/Noromaid-7B-0.4-DPO
	parameters:
	density: 0.58
	weight: [0.05, 0.05, 0.25, 0.40]
	- model: argilla/distilabeled-OpenHermes-2.5-Mistral-7B
	parameters:
	density: 0.58
	weight: [0.40, 0.50, 0.25, 0.05]
	- model: mlabonne/AlphaMonarch-7B
	parameters:
	density: 0.58
	weight: [0.05, 0.05, 0.25, 0.50]
	merge_method: dare_ties
	base_model: mistralai/Mistral-7B-v0.1
	parameters:
	int8_mask: true
	dtype: bfloat16

	```
	## Benchmark Testing
	### MT-Bench
	![image/png](https://cdn-uploads.huggingface.co/production/uploads/655a9883cbbaec115c3fd6b3/H2BLoovTbLg8d8mtFSKYB.png)

	### EQ-Bench Leaderboard

	<img src="https://cdn-uploads.huggingface.co/production/uploads/655a9883cbbaec115c3fd6b3/0Z6AIhaqCiKREf0fQEVqr.png" alt="drawing" width="800"/>


	### Table of Benchmarks

	## Open LLM Leaderboard

	\| \| Average \| ARC \| HellaSwag \| MMLU \| TruthfulQA \| Winogrande \| GSM8K \|
	\|---------------------------------------------------------\|---------\|-------\|-----------\|-------\|------------\|------------\|-------\|
	\| giraffe176/WestMaid_HermesMonarchv0.1 \| 72.62 \| 70.22 \| 87.42 \| 64.31 \| 61.99 \| 82.16 \| 69.6 \|
	\| AlphaMonarch-7B \| 75.99 \| 73.04 \| 89.18 \| 64.4 \| 77.91 \| 84.69 \| 66.72 \|
	\| senseable/WestLake-7B-v2 \| 74.68 \| 73.04 \| 88.65 \| 64.71 \| 67.06 \| 86.98 \| 67.63 \|
	\| teknium/OpenHermes-2.5-Mistral-7B \| 61.52 \| 64.93 \| 84.18 \| 63.64 \| 52.24 \| 78.06 \| 26.08 \|
	\| NeverSleep/Noromaid-7B-0.4-DPO \| 59.08 \| 62.29 \| 84.32 \| 63.2 \| 42.28 \| 76.95 \| 25.47 \|



	## Yet Another LLM Leaderboard benchmarks

	\| Model \|AGIEval\|GPT4All\|TruthfulQA\|Bigbench\|Average\|
	\|------------------------------------------------------------------------------------------\|------:\|------:\|---------:\|-------:\|------:\|
	\|[WestMaid_HermesMonarchv0.1](https://huggingface.co/giraffe176/WestMaid_HermesMonarchv0.1)\| 45.34\| 76.33\| 61.99\| 46.02\| 57.42\|

	## Misc. Benchmarks

	\| \| MT-Bench \| EQ-Bench v2.1 \|
	\|---------------------------------------------------------\|---------------------------------------------\|---------------------------------------------------------------------------------\|
	\| giraffe176/WestMaid_HermesMonarchv0.1 \| 8.021875 \| 77.19 (3 Shot, ooba) \|
	\| AlphaMonarch-7B \| 7.928125 \| 76.08 \|
	\| senseable/WestLake-7B-v2 \| \| 78.7 \|
	\| teknium/OpenHermes-2.5-Mistral-7B \| \| 66.89 \|
	\| claude-v1 \| 7.900000 \| 76.83 \|
	\| gpt-3.5-turbo \| 7.943750 \| 71.74 \|
	\| \| [(Paper)](https://arxiv.org/abs/2306.05685) \| [(Paper)](https://arxiv.org/abs/2312.06281) [Leaderboard](https://eqbench.com/) \|