mlabonne
/

NeuralHermes-2.5-Mistral-7B-GGUF

Inference Endpoints

Model card Files Files and versions Community

NeuralHermes-2.5-Mistral-7B-GGUF / README.md

mlabonne's picture

Create README.md

49c1157 12 months ago

|

2.92 kB

	---
	base_model: teknium/OpenHermes-2.5-Mistral-7B
	tags:
	- mistral
	- instruct
	- finetune
	- chatml
	- gpt4
	- synthetic data
	- distillation
	- dpo
	- rlhf
	license: apache-2.0
	language:
	- en
	datasets:
	- mlabonne/chatml_dpo_pairs
	---

	<center><img src="https://i.imgur.com/qIhaFNM.png"></center>

	# NeuralHermes 2.5 - Mistral 7B - GGUF

	NeuralHermes is an [OpenHermes-2.5-Mistral-7B](https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B) model that has been further fine-tuned with Direct Preference Optimization (DPO) using the [mlabonne/chatml_dpo_pairs](https://huggingface.co/datasets/mlabonne/chatml_dpo_pairs) dataset.

	It is directly inspired by the RLHF process described by [neural-chat-7b-v3-1](https://huggingface.co/Intel/neural-chat-7b-v3-1)'s authors to improve performance. I used the same dataset and reformatted it to apply the ChatML template. I haven't performed a comprehensive evaluation of the model, but it works great, nothing broken apparently! :)

	The code to train this model is available on [Google Colab](https://colab.research.google.com/drive/15iFBr1xWgztXvhrj5I9fBv20c7CFOPBE?usp=sharing) and [GitHub](https://github.com/mlabonne/llm-course/tree/main). It required an A100 GPU for about an hour.

	Link to the original model: [mlabonne/NeuralHermes-2.5-Mistral-7B](https://huggingface.co/mlabonne/NeuralHermes-2.5-Mistral-7B).

	Article and code to quantize your own LLMs: [Quantize Llama models with GGUF and llama.cpp](https://mlabonne.github.io/blog/posts/Quantize_Llama_2_models_using_ggml.html)

	## Usage

	You can run this model using [LM Studio](https://lmstudio.ai/) or any other frontend.

	You can also run this model using the following code:

	```python
	import transformers
	from transformers import AutoTokenizer

	# Format prompt
	message = [
	{"role": "system", "content": "You are a helpful assistant chatbot."},
	{"role": "user", "content": "What is a Large Language Model?"}
	]
	tokenizer = AutoTokenizer.from_pretrained(new_model)
	prompt = tokenizer.apply_chat_template(message, add_generation_prompt=True, tokenize=False)

	# Create pipeline
	pipeline = transformers.pipeline(
	"text-generation",
	model=new_model,
	tokenizer=tokenizer
	)

	# Generate text
	sequences = pipeline(
	prompt,
	do_sample=True,
	temperature=0.7,
	top_p=0.9,
	num_return_sequences=1,
	max_length=200,
	)
	print(sequences[0]['generated_text'])
	```


	## Training hyperparameters

	LoRA:
	* r=16,
	* lora_alpha=16,
	* lora_dropout=0.05,
	* bias="none",
	* task_type="CAUSAL_LM",
	* target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']

	Training arguments:
	* per_device_train_batch_size=4,
	* gradient_accumulation_steps=4,
	* gradient_checkpointing=True,
	* learning_rate=5e-5,
	* lr_scheduler_type="cosine",
	* max_steps=200,
	* optim="paged_adamw_32bit",
	* warmup_steps=100,

	DPOTrainer:
	* beta=0.1,
	* max_prompt_length=1024,
	* max_length=1536,