starble-dev
/

Hollow-Tail-V1-12B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Hollow-Tail-V1-12B / README.md

starble-dev's picture

Create README.md

3c35ed0 verified 4 months ago

|

history blame contribute delete

3.35 kB

	---
	tags:
	- mistral
	- conversational
	- text-generation-inference
	- mergekit
	- merge
	base_model:
	- Sao10K/MN-12B-Lyra-v2a1
	- migtissera/Tess-3-Mistral-Nemo-12B
	- TheDrummer/Rocinante-12B-v1.1
	library_name: transformers
	---
	> [!WARNING]
	> General Use Sampling:<br>
	> Mistral-Nemo-12B is very sensitive to the temperature sampler, try values near 0.3 at first or else you will get some weird results. This is mentioned by MistralAI at their [Transformers](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407#transformers) section.

	> [!NOTE]
	> Best Samplers:<br>
	> I found best success using the following for Hollow-Tail-V1-12B:<br>
	> Temperature: `1.2`<br>
	> Top K: `-1`<br>
	> Min P: `0.05`<br>
	> Rep Penalty: `1.08`<br>

	# Results
	Disclaimer: This is a model merge!

	Seems to be a bit smarter than I expected from my experience. May need a bit of guidance through system prompts at the beginning but it was quite fun to use. Though one thing I've realized with Mistral-Nemo is that the model does not really seem to end correctly, so instead I use the following format: <br>

	Obviously this is just my personal experience but I find this to be a good setup, I strongly recommend you experiment with different system formats and see which is better for you. Note that none of these models are fine-tuned for this specific format, I believe most of them are fine-tuned on Mistral's original [INST] and [/INST] format or ChatML.
	```
	<[start_system]>
	You are a professional writer.
	<[STOP]>
	<[start_prompt]>
	User prompt here.
	<[STOP]>
	<[start_model]>
	Model response here.
	<[STOP]>
	```

	Original Models: <br>
	- [Sao10K/MN-12B-Lyra-v2a1](https://huggingface.co/Sao10K/MN-12B-Lyra-v2a1) (Thank you so much for your work ♥)
	- [migtissera/Tess-3-Mistral-Nemo-12B](https://huggingface.co/migtissera/Tess-3-Mistral-Nemo-12B) (Thank you so much for your work ♥)
	- [TheDrummer/Rocinante-12B-v1.1](https://huggingface.co/TheDrummer/Rocinante-12B-v1.1) (Thank you so much for your work ♥)

	GGUF Quants: <br>
	- [starble-dev/Hollow-Tail-V1-12B-GGUF](https://huggingface.co/starble-dev/Hollow-Tail-V1-12B-GGUF)

	Original Model Licenses: <br>
	- Sao10K/MN-12B-Lyra-v2a1 is licensed under Creative Commons Attribution Non Commercial 4.0
	- migtissera/Tess-3-Mistral-Nemo-12B is licensed under apache-2.0
	- TheDrummer/Rocinante-12B-v1.1 license is not specified

	---

	# Hollow-Tail-V1-12B

	This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).

	## Merge Details
	### Merge Method

	This model was merged using the [linear](https://arxiv.org/abs/2203.05482) merge method using models/Rocinante-12B-v1.1 as a base.

	### Models Merged

	The following models were included in the merge:
	* migtissera/Tess-3-Mistral-Nemo-12B
	* Sao10K/MN-12B-Lyra-v2a1
	* TheDrummer/Rocinante-12B-v1.1

	### Configuration

	The following YAML configuration was used to produce this model:

	```yaml
	models:
	- model: Sao10K/MN-12B-Lyra-v2a1
	parameters:
	weight: 0.8
	- model: migtissera/Tess-3-Mistral-Nemo-12B
	parameters:
	weight: 0.2
	- model: TheDrummer/Rocinante-12B-v1.1
	parameters:
	weight: 0.8
	merge_method: linear
	base_model: TheDrummer/Rocinante-12B-v1.1
	parameters:
	normalize: true
	int8_mask: true
	dtype: bfloat16
	```