malteos
/

hermeo-7b

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

hermeo-7b / README.md

malteos's picture

Files uploaded (1/9)

ae27b01 about 1 year ago

|

3.86 kB

	---
	language:
	- en
	- de
	library_name: transformers
	pipeline_tag: text-generation
	license: apache-2.0
	---

	![image/png](https://huggingface.co/datasets/malteos/images/resolve/main/hermeo.medium.png)

	_Hermes + Leo = Hermeo_

	# Hermeo-7B

	A German-English language model merged from [DPOpenHermes-7B-v2](https://huggingface.co/openaccess-ai-collective/DPOpenHermes-7B-v2) and [leo-mistral-hessianai-7b-chat](https://huggingface.co/LeoLM/leo-mistral-hessianai-7b-chat) using [mergekit](https://github.com/cg123/mergekit).
	Both base models are fine-tuned versions of [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1).


	### Model details

	- Merged from: [leo-mistral-hessianai-7b-chat](https://huggingface.co/LeoLM/leo-mistral-hessianai-7b-chat) and [DPOpenHermes-7B-v2](https://huggingface.co/openaccess-ai-collective/DPOpenHermes-7B-v2)
	- Model type: Causal decoder-only transformer language model
	- Languages: English and German
	- License: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0.html)

	### Acknowledgements

	- This model release is heavily inspired by [Weyaxi/OpenHermes-2.5-neural-chat-v3-2-Slerp](https://huggingface.co/Weyaxi/OpenHermes-2.5-neural-chat-v3-2-Slerp)
	- Thanks to the authors of the base models: [Mistral](https://mistral.ai/), [LAION](https://laion.ai/), [HessianAI](https://hessian.ai/), [Open Access AI Collective](https://huggingface.co/openaccess-ai-collective), [@teknium](https://huggingface.co/teknium), [@bjoernp](https://huggingface.co/bjoernp)
	- The [German evaluation datasets and scripts](https://github.com/bjoernpl/lm-evaluation-harness-de/tree/mmlu_de) from [@bjoernp](https://huggingface.co/bjoernp) were used.
	- The computing resources from [DFKI's PEGASUS cluster](https://pegasus.dfki.de/) were used for the evaluation.


	## Evaluation

	The evaluation methdology of the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) is followed.

	### German benchmarks

	\| German tasks: \| MMLU-DE \| Hellaswag-DE \| ARC-DE \|
	\|-------------------------------\|-------------\|---------------\|--------------\|
	\| Models / Few-shots: \| _(5 shots)_ \| _(10 shots)_ \| _(24 shots)_ \|
	\| _7B parameters_ \| \| \| \|
	\| llama-2-7b \| 0.400 \| 0.513 \| 0.381 \|
	\| leo-hessianai-7b \| 0.400 \| 0.609 \| 0.429 \|
	\| bloom-6b4-clp-german \| 0.274 \| 0.550 \| 0.351 \|
	\| mistral-7b \| 0.524 \| 0.588 \| 0.473 \|
	\| leo-mistral-hessianai-7b \| 0.481 \| 0.663 \| 0.485 \|
	\| leo-mistral-hessianai-7b-chat \| 0.458 \| 0.617 \| 0.465 \|
	\| DPOpenHermes-7B-v2 \| TBA \| 0.603 \| 0.515 \|
	\| hermeo-7b (this model) \| 0.511 \| 0.668 \| 0.528 \|
	\| _13B parameters_ \| \| \| \|
	\| llama-2-13b \| 0.469 \| 0.581 \| 0.468 \|
	\| leo-hessianai-13b \| 0.486 \| 0.658 \| 0.509 \|
	\| _70B parameters_ \| \| \| \|
	\| llama-2-70b \| 0.597 \| 0.674 \| 0.561 \|
	\| leo-hessianai-70b \| 0.653 \| 0.721 \| 0.600 \|

	### English benchmarks

	TBA

	## Prompting / Prompt Template

	Prompt dialogue template (ChatML format):

	```
	"""
	<\|im_start\|>system
	{system_message}<\|im_end\|>
	<\|im_start\|>user
	{prompt}<\|im_end\|>
	<\|im_start\|>assistant
	"""
	```

	The model input can contain multiple conversation turns between user and assistant, e.g.
	```
	<\|im_start\|>user
	{prompt 1}<\|im_end\|>
	<\|im_start\|>assistant
	{reply 1}<\|im_end\|>
	<\|im_start\|>user
	{prompt 2}<\|im_end\|>
	<\|im_start\|>assistant
	(...)
	```

	## License

	[Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0.html)