Update README.md

c162c00 verified about 1 month ago

3.76 kB

	---
	license: apache-2.0
	tags:
	- merge
	- mergekit
	- lazymergekit
	- Locutusque/Hercules-6.1-Llama-3.1-8B
	- Sao10K/Llama-3.1-8B-Stheno-v3.4
	base_model:
	- Locutusque/Hercules-6.1-Llama-3.1-8B
	---
	README.md

	# ZeroXClem/Stheno-Hercules-3.1-8B

	ZeroXClem/Stheno-Hercules-3.1-8B is an advanced model merge, combining the strengths of two state-of-the-art models using the powerful [mergekit](https://github.com/cg123/mergekit) framework. This model is designed to maximize performance by blending different architecture layers and leveraging cutting-edge interpolation techniques, bringing together the best of both worlds: Hercules and Stheno.

	## 🚀 Merged Models

	This model merge incorporates the following:

	- [Locutusque/Hercules-6.1-Llama-3.1-8B](https://huggingface.co/Locutusque/Hercules-6.1-Llama-3.1-8B): Known for its powerful attention mechanisms and deep neural layers, Hercules-6.1 serves as the base for this merge.
	- [Sao10K/Llama-3.1-8B-Stheno-v3.4](https://huggingface.co/Sao10K/Llama-3.1-8B-Stheno-v3.4): Complementing Hercules, Stheno-v3.4 contributes its refined, balanced network architecture for added depth and flexibility.

	## 🧩 Merge Configuration

	The configuration below outlines how the models are merged using spherical linear interpolation (SLERP), which allows for smooth transitions between the layers of both models, ensuring an optimal blend of their unique attributes:

	```yaml
	slices:
	- sources:
	- model: Locutusque/Hercules-6.1-Llama-3.1-8B
	layer_range: [0, 32]
	- model: Sao10K/Llama-3.1-8B-Stheno-v3.4
	layer_range: [0, 32]
	merge_method: slerp
	base_model: Locutusque/Hercules-6.1-Llama-3.1-8B
	parameters:
	t:
	- filter: self_attn
	value: [0, 0.5, 0.3, 0.7, 1] # Controls the blending of self-attention layers
	- filter: mlp
	value: [1, 0.5, 0.7, 0.3, 0] # Adjusts the blending across the MLP layers
	- value: 0.5 # Global merge weight for layers not specified by filters
	dtype: bfloat16 # Optimized for efficiency and performance

	```

	### Key Parameters

	- Self-Attention Filtering (`self_attn`): Controls the extent of blending across self-attention layers, ranging from full to partial utilization from both models at various levels.
	- MLP Filtering (`mlp`): Similar to self-attention, this filter applies to the Multi-Layer Perceptrons, fine-tuning the neural network’s layer balance.
	- Global Weight (`t.value`): A general interpolation factor for all layers not explicitly defined by the filters, set at 0.5 for an equal contribution from both models.
	- Data Type (`dtype`): Uses `bfloat16` to maintain computational efficiency while ensuring a high level of precision.

	## 🎯 Use Case & Applications

	ZeroXClem/Stheno-Hercules-3.1-8B is where imagination meets intelligence, a model built to seamlessly weave together the art of roleplay and the precision of science. With the raw power of Hercules fueling your creations and Stheno’s delicate balance guiding every interaction, this model thrives in:

	- Immersive storytelling and dynamic roleplaying: Craft rich, believable characters and worlds with unparalleled depth, emotional nuance, and narrative flow.
	- Scientific exploration and discovery: Unleash your mind’s full potential for complex problem-solving, hypothesis testing, and advanced AI-driven research.
	- Blending creativity and logic: A harmonious fusion of heart and intellect, this model handles anything from playful creativity to rigorous scientific applications.

	## 📜 License

	This model is open-sourced under the Apache-2.0 License.

	## 💡 Tags

	- `merge`
	- `mergekit`
	- `lazymergekit`
	- `Locutusque/Hercules-6.1-Llama-3.1-8B`
	- `Sao10K/Llama-3.1-8B-Stheno-v3.4`