Update README.md

f1c286a verified 4 months ago

8.29 kB

	---
	license: llama3
	language:
	- de
	- en
	library_name: transformers
	---

	# Llama3_DiscoLeo-Instruct-8B-32k-v0.1-4bit-awq

	This model is a 4 bit quantization of [DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1](https://huggingface.co/DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1)
	created using [AutoAWQ](https://github.com/casper-hansen/AutoAWQ) with a custom bilingual calibration dataset and `quant_config = {"zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM"}`.

	Copy of original model card:

	# Llama3-DiscoLeo-Instruct 8B (version 0.1)

	## Thanks and Accreditation

	[DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1](https://huggingface.co/collections/DiscoResearch/discoleo-8b-llama3-for-german-6650527496c0fafefd4c9729)
	is the result of a joint effort between [DiscoResearch](https://huggingface.co/DiscoResearch) and [Occiglot](https://huggingface.co/occiglot)
	with support from the [DFKI](https://www.dfki.de/web/) (German Research Center for Artificial Intelligence) and [hessian.Ai](https://hessian.ai).
	Occiglot kindly handled data preprocessing, filtering, and deduplication as part of their latest [dataset release](https://huggingface.co/datasets/occiglot/occiglot-fineweb-v0.5), as well as sharing their compute allocation at hessian.Ai's 42 Supercomputer.

	## Model Overview

	Llama3-DiscoLeo-Instruct-8B-v0 is an instruction tuned version of our [Llama3-German-8B](https://huggingface.co/DiscoResearch/Llama3-German-8B).
	The base model was derived from [Meta's Llama3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) through continuous pretraining on 65 billion high-quality German tokens, similar to previous [LeoLM](https://huggingface.co/LeoLM) or [Occiglot](https://huggingface.co/collections/occiglot/occiglot-eu5-7b-v01-65dbed502a6348b052695e01) models.
	We finetuned this checkpoint on the German Instruction dataset from DiscoResearch created by [Jan-Philipp Harries](https://huggingface.co/jphme) and [Daniel Auras](https://huggingface.co/rasdani) ([DiscoResearch](https://huggingface.co/DiscoResearch), [ellamind](https://ellamind.com)).


	## How to use
	Llama3-DiscoLeo-Instruct-8B-v0.1 uses the [Llama-3 chat template](https://github.com/meta-llama/llama3?tab=readme-ov-file#instruction-tuned-models), which can be easily used with [transformer's chat templating](https://huggingface.co/docs/transformers/main/en/chat_templating).
	See [below](https://huggingface.co/DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1#usage-example) for a usage example.

	## Model Training and Hyperparameters
	The model was full-fintuned with axolotl on the [hessian.Ai 42](hessian.ai) with 8192 context-length, learning rate 2e-5 and batch size of 16.


	## Evaluation and Results

	We evaluated the model using a suite of common English Benchmarks and their German counterparts with [GermanBench](https://github.com/bjoernpl/GermanBenchmark).

	In the below image and corresponding table, you can see the benchmark scores for the different instruct models compared to Metas instruct version. All checkpoints are available in this [collection](https://huggingface.co/collections/DiscoResearch/discoleo-8b-llama3-for-german-6650527496c0fafefd4c9729).

	![instruct scores](instruct_model_benchmarks.png)

	\| Model \| truthful_qa_de \| truthfulqa_mc \| arc_challenge \| arc_challenge_de \| hellaswag \| hellaswag_de \| MMLU \| MMLU-DE \| mean \|
	\|----------------------------------------------------\|----------------\|---------------\|---------------\|------------------\|-------------\|--------------\|-------------\|-------------\|-------------\|
	\| meta-llama/Meta-Llama-3-8B-Instruct \| 0.47498 \| 0.43923 \| 0.59642 \| 0.47952 \| 0.82025 \| 0.60008 \| 0.66658 \| 0.53541 \| 0.57656 \|
	\| DiscoResearch/Llama3-German-8B \| 0.49499 \| 0.44838 \| 0.55802 \| 0.49829 \| 0.79924 \| 0.65395 \| 0.62240 \| 0.54413 \| 0.57743 \|
	\| DiscoResearch/Llama3-German-8B-32k \| 0.48920 \| 0.45138 \| 0.54437 \| 0.49232 \| 0.79078 \| 0.64310 \| 0.58774 \| 0.47971 \| 0.55982 \|
	\| DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1 \| 0.53042 \| 0.52867 \| 0.59556 \| 0.53839 \| 0.80721 \| 0.66440 \| 0.61898 \| 0.56053 \| 0.60552 \|
	\| DiscoResearch/Llama3-DiscoLeo-Instruct-8B-32k-v0.1\| 0.52749 \| 0.53245 \| 0.58788 \| 0.53754 \| 0.80770 \| 0.66709 \| 0.62123 \| 0.56238 \| 0.60547 \|

	## Model Configurations

	We release DiscoLeo-8B in the following configurations:
	1. [Base model with continued pretraining](https://huggingface.co/DiscoResearch/Llama3_German_8B)
	2. [Long-context version (32k context length)](https://huggingface.co/DiscoResearch/Llama3_German_8B_32k)
	3. [Instruction-tuned version of the base model](https://huggingface.co/DiscoResearch/Llama3_DiscoLeo_Instruct_8B_v0.1) (This model)
	4. [Instruction-tuned version of the long-context model](https://huggingface.co/DiscoResearch/Llama3_DiscoLeo_Instruct_8B_32k_v0.1)
	5. [Experimental `DARE-TIES` Merge with Llama3-Instruct](https://huggingface.co/DiscoResearch/Llama3_DiscoLeo_8B_DARE_Experimental)
	6. [Collection of Quantized versions](https://huggingface.co/collections/DiscoResearch/discoleo-8b-quants-6651bcf8f72c9a37ce485d42)

	## Usage Example
	Here's how to use the model with transformers:
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained(
	"DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1",
	torch_dtype="auto",
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained("DiscoResearch/Llama3-DiscoLeo-Instruct-8B-v0.1")

	prompt = "Schreibe ein Essay über die Bedeutung der Energiewende für Deutschlands Wirtschaft"
	messages = [
	{"role": "system", "content": "Du bist ein hilfreicher Assistent."},
	{"role": "user", "content": prompt}
	]
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True
	)
	model_inputs = tokenizer([text], return_tensors="pt").to(device)

	generated_ids = model.generate(
	model_inputs.input_ids,
	max_new_tokens=512
	)
	generated_ids = [
	output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
	]

	response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	```

	## Acknowledgements

	The model was trained and evaluated by [Björn Plüster](https://huggingface.co/bjoernp) ([DiscoResearch](https://huggingface.co/DiscoResearch), [ellamind](https://ellamind.com)) with data preparation and project supervision by [Manuel Brack](http://manuel-brack.eu) ([DFKI](https://www.dfki.de/web/), [TU-Darmstadt](https://www.tu-darmstadt.de/)). Instruction tuning was done with the DiscoLM German dataset created by [Jan-Philipp Harries](https://huggingface.co/jphme) and [Daniel Auras](https://huggingface.co/rasdani) ([DiscoResearch](https://huggingface.co/DiscoResearch), [ellamind](https://ellamind.com)). We extend our gratitude to [LAION](https://laion.ai/) and friends, especially [Christoph Schuhmann](https://entwickler.de/experten/christoph-schuhmann) and [Jenia Jitsev](https://huggingface.co/JJitsev), for initiating this collaboration.

	The model training was supported by a compute grant at the [42 supercomputer](https://hessian.ai/) which is a central component in the development of [hessian AI](https://hessian.ai/), the [AI Innovation Lab](https://hessian.ai/infrastructure/ai-innovationlab/) (funded by the [Hessian Ministry of Higher Education, Research and the Art (HMWK)](https://wissenschaft.hessen.de) & the [Hessian Ministry of the Interior, for Security and Homeland Security (HMinD)](https://innen.hessen.de)) and the [AI Service Centers](https://hessian.ai/infrastructure/ai-service-centre/) (funded by the [Federal Ministry of Education and Research (BMBF)](https://www.bmbf.de/)).
	The curation of the training data is partially funded by the [German Federal Ministry for Economic Affairs and Climate Action (BMWK)](https://www.bmwk.de/Navigation/EN/Home/home.html)
	through the project [OpenGPT-X](https://opengpt-x.de/en/) (project no. 68GX21007D).