license: apache-2.0
tags:
- merge
- mergekit
- lazymergekit
- Locutusque/Hercules-6.1-Llama-3.1-8B
- Sao10K/Llama-3.1-8B-Stheno-v3.4
base_model:
- Locutusque/Hercules-6.1-Llama-3.1-8B
ZeroXClem/Stheno-Hercules-3.1-8B
ZeroXClem/Stheno-Hercules-3.1-8B is an advanced model merge, synergizing the strengths of two domains; S.T.E.M and RP based models using the powerful mergekit framework. This model is designed to maximize performance by blending different architecture layers and leveraging cutting-edge interpolation techniques, bringing together the best of both worlds: Hercules and Stheno.
🚀 Merged Models
This model merge incorporates the following:
- Locutusque/Hercules-6.1-Llama-3.1-8B: Known for its powerful attention mechanisms and deep neural layers, Hercules-6.1 serves as the base for this merge.
- Sao10K/Llama-3.1-8B-Stheno-v3.4: Complementing Hercules, Stheno-v3.4 contributes its refined, balanced network architecture for added depth and flexibility.
🧩 Merge Configuration
The configuration below outlines how the models are merged using spherical linear interpolation (SLERP), which allows for smooth transitions between the layers of both models, ensuring an optimal blend of their unique attributes:
slices:
- sources:
- model: Locutusque/Hercules-6.1-Llama-3.1-8B
layer_range: [0, 32]
- model: Sao10K/Llama-3.1-8B-Stheno-v3.4
layer_range: [0, 32]
merge_method: slerp
base_model: Locutusque/Hercules-6.1-Llama-3.1-8B
parameters:
t:
- filter: self_attn
value: [0, 0.5, 0.3, 0.7, 1] # Controls the blending of self-attention layers
- filter: mlp
value: [1, 0.5, 0.7, 0.3, 0] # Adjusts the blending across the MLP layers
- value: 0.5 # Global merge weight for layers not specified by filters
dtype: bfloat16 # Optimized for efficiency and performance
Key Parameters
- Self-Attention Filtering (
self_attn
): Controls the extent of blending across self-attention layers, ranging from full to partial utilization from both models at various levels. - MLP Filtering (
mlp
): Similar to self-attention, this filter applies to the Multi-Layer Perceptrons, fine-tuning the neural network’s layer balance. - Global Weight (
t.value
): A general interpolation factor for all layers not explicitly defined by the filters, set at 0.5 for an equal contribution from both models. - Data Type (
dtype
): Usesbfloat16
to maintain computational efficiency while ensuring a high level of precision.
🎯 Use Case & Applications
ZeroXClem/Stheno-Hercules-3.1-8B is where imagination meets intelligence, a model built to seamlessly weave together the art of roleplay and the precision of science. With the raw power of Hercules fueling your creations and Stheno’s delicate balance guiding every interaction, this model thrives in:
- Immersive storytelling and dynamic roleplaying: Craft rich, believable characters and worlds with unparalleled depth, emotional nuance, and narrative flow.
- Scientific exploration and discovery: Unleash your mind’s full potential for complex problem-solving, hypothesis testing, and advanced AI-driven research.
- Blending creativity and logic: A harmonious fusion of heart and intellect, this model handles anything from playful creativity to rigorous scientific applications.
📜 License
This model is open-sourced under the Apache-2.0 License.
💡 Tags
merge
mergekit
lazymergekit
Locutusque/Hercules-6.1-Llama-3.1-8B
Sao10K/Llama-3.1-8B-Stheno-v3.4