hermeo-7b / README.md

Update README.md

a21701f verified 11 months ago

5.52 kB

	---
	language:
	- en
	- de
	library_name: transformers
	pipeline_tag: text-generation
	license: apache-2.0
	tags:
	- merge
	- mergekit
	---

	![image/png](https://huggingface.co/datasets/malteos/images/resolve/main/hermeo.medium.png)

	_Hermes + Leo = Hermeo_

	# Hermeo-7B

	A German-English language model merged from [DPOpenHermes-7B-v2](https://huggingface.co/openaccess-ai-collective/DPOpenHermes-7B-v2) and [leo-mistral-hessianai-7b-chat](https://huggingface.co/LeoLM/leo-mistral-hessianai-7b-chat) using [mergekit](https://github.com/cg123/mergekit).
	Both base models are fine-tuned versions of [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1).


	### Model details

	- Merged from: [leo-mistral-hessianai-7b-chat](https://huggingface.co/LeoLM/leo-mistral-hessianai-7b-chat) and [DPOpenHermes-7B-v2](https://huggingface.co/openaccess-ai-collective/DPOpenHermes-7B-v2)
	- Model type: Causal decoder-only transformer language model
	- Languages: English and German
	- License: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0.html)

	### How to use

	You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we
	set a seed for reproducibility:

	```python
	>>> from transformers import pipeline, set_seed
	>>> generator = pipeline('text-generation', model='malteos/hermeo-7b')
	>>> set_seed(42)
	>>> generator("Hallo, Ich bin ein Sprachmodell,", max_length=40, num_return_sequences=1)
	[{'generated_text': 'Hallo, Ich bin ein Sprachmodell, das dir bei der Übersetzung von Texten zwischen Deutsch und Englisch helfen kann. Wenn du mir einen Text in Deutsch'}]
	```


	### Acknowledgements

	- This model release is heavily inspired by [Weyaxi/OpenHermes-2.5-neural-chat-v3-2-Slerp](https://huggingface.co/Weyaxi/OpenHermes-2.5-neural-chat-v3-2-Slerp)
	- Thanks to the authors of the base models: [Mistral](https://mistral.ai/), [LAION](https://laion.ai/), [HessianAI](https://hessian.ai/), [Open Access AI Collective](https://huggingface.co/openaccess-ai-collective), [@teknium](https://huggingface.co/teknium), [@bjoernp](https://huggingface.co/bjoernp)
	- The [German evaluation datasets and scripts](https://github.com/bjoernpl/lm-evaluation-harness-de/tree/mmlu_de) from [@bjoernp](https://huggingface.co/bjoernp) were used.
	- The computing resources from [DFKI's PEGASUS cluster](https://pegasus.dfki.de/) were used for the evaluation.


	## Evaluation

	The evaluation methdology of the [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) is followed.

	### German benchmarks

	\| German tasks: \| MMLU-DE \| Hellaswag-DE \| ARC-DE \|Average \|
	\|-------------------------------\|-------------\|---------------\|--------------\|--------------\|
	\| Models / Few-shots: \| _(5 shots)_ \| _(10 shots)_ \| _(24 shots)_ \| \|
	\| _7B parameters_ \| \| \| \| \|
	\| llama-2-7b \| 0.400 \| 0.513 \| 0.381 \| 0.431 \|
	\| leo-hessianai-7b \| 0.400 \| 0.609 \| 0.429 \| 0.479 \|
	\| bloom-6b4-clp-german \| 0.274 \| 0.550 \| 0.351 \| 0.392 \|
	\| mistral-7b \| 0.524 \| 0.588 \| 0.473 \| 0.528 \|
	\| leo-mistral-hessianai-7b \| 0.481 \| 0.663 \| 0.485 \| 0.543 \|
	\| leo-mistral-hessianai-7b-chat \| 0.458 \| 0.617 \| 0.465 \| 0.513 \|
	\| DPOpenHermes-7B-v2 \| 0.517 \| 0.603 \| 0.515 \| 0.545 \|
	\| hermeo-7b (this model) \| 0.511 \| 0.668 \| 0.528 \| 0.569 \|
	\| _13B parameters_ \| \| \| \| \|
	\| llama-2-13b \| 0.469 \| 0.581 \| 0.468 \| 0.506 \|
	\| leo-hessianai-13b \| 0.486 \| 0.658 \| 0.509 \| 0.551 \|
	\| _70B parameters_ \| \| \| \| \|
	\| llama-2-70b \| 0.597 \| 0.674 \| 0.561 \| 0.611 \|
	\| leo-hessianai-70b \| 0.653 \| 0.721 \| 0.600 \| 0.658 \|

	### English benchmarks

	\| English tasks: \| MMLU \| Hellaswag \| ARC \| Average \|
	\|----------------------------\|-------------\|---------------\|--------------\|-------------\|
	\| Models / Few-shots: \| _(5 shots)_ \| _(10 shots)_ \| _(24 shots)_ \| \|
	\| llama-2-7b \| 0.466 \| 0.786 \| 0.530 \| 0.594 \|
	\| leolm-hessianai-7b \| 0.423 \| 0.759 \| 0.522 \| 0.568 \|
	\| bloom-6b4-clp-german \| 0.264 \| 0.525 \| 0.328 \| 0.372 \|
	\| mistral-7b \| 0.635 \| 0.832 \| 0.607 \| 0.691 \|
	\| leolm-mistral-hessianai-7b \| 0.550 \| 0.777 \| 0.518 \| 0.615 \|
	\| hermeo-7b (this model) \| 0.601 \| 0.821 \| 0.620 \| 0.681 \|

	## Prompting / Prompt Template

	Prompt dialogue template (ChatML format):

	```
	"""
	<\|im_start\|>system
	{system_message}<\|im_end\|>
	<\|im_start\|>user
	{prompt}<\|im_end\|>
	<\|im_start\|>assistant
	"""
	```

	The model input can contain multiple conversation turns between user and assistant, e.g.
	```
	<\|im_start\|>user
	{prompt 1}<\|im_end\|>
	<\|im_start\|>assistant
	{reply 1}<\|im_end\|>
	<\|im_start\|>user
	{prompt 2}<\|im_end\|>
	<\|im_start\|>assistant
	(...)
	```

	## License

	[Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0.html)

	## See also

	- AWQ quantized version: https://huggingface.co/mayflowergmbh/hermeo-7b-awq