|
--- |
|
license: apache-2.0 |
|
tags: |
|
- merge |
|
- mergekit |
|
- lazymergekit |
|
- Locutusque/Hercules-6.1-Llama-3.1-8B |
|
- Sao10K/Llama-3.1-8B-Stheno-v3.4 |
|
base_model: |
|
- Locutusque/Hercules-6.1-Llama-3.1-8B |
|
--- |
|
README.md |
|
|
|
# ZeroXClem/Stheno-Hercules-3.1-8B |
|
|
|
ZeroXClem/Stheno-Hercules-3.1-8B is an advanced model merge, combining the strengths of two state-of-the-art models using the powerful [mergekit](https://github.com/cg123/mergekit) framework. This model is designed to maximize performance by blending different architecture layers and leveraging cutting-edge interpolation techniques, bringing together the best of both worlds: **Hercules** and **Stheno**. |
|
|
|
## 🚀 Merged Models |
|
|
|
This model merge incorporates the following: |
|
|
|
- [**Locutusque/Hercules-6.1-Llama-3.1-8B**](https://huggingface.co/Locutusque/Hercules-6.1-Llama-3.1-8B): Known for its powerful attention mechanisms and deep neural layers, Hercules-6.1 serves as the base for this merge. |
|
- [**Sao10K/Llama-3.1-8B-Stheno-v3.4**](https://huggingface.co/Sao10K/Llama-3.1-8B-Stheno-v3.4): Complementing Hercules, Stheno-v3.4 contributes its refined, balanced network architecture for added depth and flexibility. |
|
|
|
## 🧩 Merge Configuration |
|
|
|
The configuration below outlines how the models are merged using **spherical linear interpolation (SLERP)**, which allows for smooth transitions between the layers of both models, ensuring an optimal blend of their unique attributes: |
|
|
|
```yaml |
|
slices: |
|
- sources: |
|
- model: Locutusque/Hercules-6.1-Llama-3.1-8B |
|
layer_range: [0, 32] |
|
- model: Sao10K/Llama-3.1-8B-Stheno-v3.4 |
|
layer_range: [0, 32] |
|
merge_method: slerp |
|
base_model: Locutusque/Hercules-6.1-Llama-3.1-8B |
|
parameters: |
|
t: |
|
- filter: self_attn |
|
value: [0, 0.5, 0.3, 0.7, 1] # Controls the blending of self-attention layers |
|
- filter: mlp |
|
value: [1, 0.5, 0.7, 0.3, 0] # Adjusts the blending across the MLP layers |
|
- value: 0.5 # Global merge weight for layers not specified by filters |
|
dtype: bfloat16 # Optimized for efficiency and performance |
|
|
|
``` |
|
|
|
### Key Parameters |
|
|
|
- **Self-Attention Filtering** (`self_attn`): Controls the extent of blending across self-attention layers, ranging from full to partial utilization from both models at various levels. |
|
- **MLP Filtering** (`mlp`): Similar to self-attention, this filter applies to the Multi-Layer Perceptrons, fine-tuning the neural network’s layer balance. |
|
- **Global Weight (`t.value`)**: A general interpolation factor for all layers not explicitly defined by the filters, set at 0.5 for an equal contribution from both models. |
|
- **Data Type (`dtype`)**: Uses `bfloat16` to maintain computational efficiency while ensuring a high level of precision. |
|
|
|
## 🎯 Use Case & Applications |
|
|
|
**ZeroXClem/Stheno-Hercules-3.1-8B** is where **imagination meets intelligence**, a model built to seamlessly weave together the **art of roleplay** and the **precision of science**. With the raw power of Hercules fueling your creations and Stheno’s delicate balance guiding every interaction, this model thrives in: |
|
|
|
- **Immersive storytelling and dynamic roleplaying**: Craft rich, believable characters and worlds with unparalleled depth, emotional nuance, and narrative flow. |
|
- **Scientific exploration and discovery**: Unleash your mind’s full potential for complex problem-solving, hypothesis testing, and advanced AI-driven research. |
|
- **Blending creativity and logic**: A harmonious fusion of heart and intellect, this model handles anything from playful creativity to rigorous scientific applications. |
|
|
|
## 📜 License |
|
|
|
This model is open-sourced under the **Apache-2.0 License**. |
|
|
|
## 💡 Tags |
|
|
|
- `merge` |
|
- `mergekit` |
|
- `lazymergekit` |
|
- `Locutusque/Hercules-6.1-Llama-3.1-8B` |
|
- `Sao10K/Llama-3.1-8B-Stheno-v3.4` |