Update README.md
Browse files
README.md
CHANGED
@@ -6,15 +6,25 @@ tags:
|
|
6 |
- lazymergekit
|
7 |
- Locutusque/Hercules-6.1-Llama-3.1-8B
|
8 |
- Sao10K/Llama-3.1-8B-Stheno-v3.4
|
|
|
|
|
9 |
---
|
|
|
10 |
|
11 |
# ZeroXClem/Stheno-Hercules-3.1-8B
|
12 |
|
13 |
-
ZeroXClem/Stheno-Hercules-3.1-8B is
|
14 |
-
|
15 |
-
|
|
|
|
|
|
|
|
|
|
|
16 |
|
17 |
-
## 🧩 Configuration
|
|
|
|
|
18 |
|
19 |
```yaml
|
20 |
slices:
|
@@ -28,10 +38,37 @@ base_model: Locutusque/Hercules-6.1-Llama-3.1-8B
|
|
28 |
parameters:
|
29 |
t:
|
30 |
- filter: self_attn
|
31 |
-
value: [0, 0.5, 0.3, 0.7, 1]
|
32 |
- filter: mlp
|
33 |
-
value: [1, 0.5, 0.7, 0.3, 0]
|
34 |
-
- value: 0.5
|
35 |
-
dtype: bfloat16
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
36 |
|
37 |
-
|
|
|
|
|
|
|
|
|
|
6 |
- lazymergekit
|
7 |
- Locutusque/Hercules-6.1-Llama-3.1-8B
|
8 |
- Sao10K/Llama-3.1-8B-Stheno-v3.4
|
9 |
+
base_model:
|
10 |
+
- Locutusque/Hercules-6.1-Llama-3.1-8B
|
11 |
---
|
12 |
+
README.md
|
13 |
|
14 |
# ZeroXClem/Stheno-Hercules-3.1-8B
|
15 |
|
16 |
+
ZeroXClem/Stheno-Hercules-3.1-8B is an advanced model merge, combining the strengths of two state-of-the-art models using the powerful [mergekit](https://github.com/cg123/mergekit) framework. This model is designed to maximize performance by blending different architecture layers and leveraging cutting-edge interpolation techniques, bringing together the best of both worlds: **Hercules** and **Stheno**.
|
17 |
+
|
18 |
+
## 🚀 Merged Models
|
19 |
+
|
20 |
+
This model merge incorporates the following:
|
21 |
+
|
22 |
+
- [**Locutusque/Hercules-6.1-Llama-3.1-8B**](https://huggingface.co/Locutusque/Hercules-6.1-Llama-3.1-8B): Known for its powerful attention mechanisms and deep neural layers, Hercules-6.1 serves as the base for this merge.
|
23 |
+
- [**Sao10K/Llama-3.1-8B-Stheno-v3.4**](https://huggingface.co/Sao10K/Llama-3.1-8B-Stheno-v3.4): Complementing Hercules, Stheno-v3.4 contributes its refined, balanced network architecture for added depth and flexibility.
|
24 |
|
25 |
+
## 🧩 Merge Configuration
|
26 |
+
|
27 |
+
The configuration below outlines how the models are merged using **spherical linear interpolation (SLERP)**, which allows for smooth transitions between the layers of both models, ensuring an optimal blend of their unique attributes:
|
28 |
|
29 |
```yaml
|
30 |
slices:
|
|
|
38 |
parameters:
|
39 |
t:
|
40 |
- filter: self_attn
|
41 |
+
value: [0, 0.5, 0.3, 0.7, 1] # Controls the blending of self-attention layers
|
42 |
- filter: mlp
|
43 |
+
value: [1, 0.5, 0.7, 0.3, 0] # Adjusts the blending across the MLP layers
|
44 |
+
- value: 0.5 # Global merge weight for layers not specified by filters
|
45 |
+
dtype: bfloat16 # Optimized for efficiency and performance
|
46 |
+
|
47 |
+
```
|
48 |
+
|
49 |
+
### Key Parameters
|
50 |
+
|
51 |
+
- **Self-Attention Filtering** (`self_attn`): Controls the extent of blending across self-attention layers, ranging from full to partial utilization from both models at various levels.
|
52 |
+
- **MLP Filtering** (`mlp`): Similar to self-attention, this filter applies to the Multi-Layer Perceptrons, fine-tuning the neural network’s layer balance.
|
53 |
+
- **Global Weight (`t.value`)**: A general interpolation factor for all layers not explicitly defined by the filters, set at 0.5 for an equal contribution from both models.
|
54 |
+
- **Data Type (`dtype`)**: Uses `bfloat16` to maintain computational efficiency while ensuring a high level of precision.
|
55 |
+
|
56 |
+
## 🎯 Use Case & Applications
|
57 |
+
|
58 |
+
**ZeroXClem/Stheno-Hercules-3.1-8B** is where **imagination meets intelligence**, a model built to seamlessly weave together the **art of roleplay** and the **precision of science**. With the raw power of Hercules fueling your creations and Stheno’s delicate balance guiding every interaction, this model thrives in:
|
59 |
+
|
60 |
+
- **Immersive storytelling and dynamic roleplaying**: Craft rich, believable characters and worlds with unparalleled depth, emotional nuance, and narrative flow.
|
61 |
+
- **Scientific exploration and discovery**: Unleash your mind’s full potential for complex problem-solving, hypothesis testing, and advanced AI-driven research.
|
62 |
+
- **Blending creativity and logic**: A harmonious fusion of heart and intellect, this model handles anything from playful creativity to rigorous scientific applications.
|
63 |
+
|
64 |
+
## 📜 License
|
65 |
+
|
66 |
+
This model is open-sourced under the **Apache-2.0 License**.
|
67 |
+
|
68 |
+
## 💡 Tags
|
69 |
|
70 |
+
- `merge`
|
71 |
+
- `mergekit`
|
72 |
+
- `lazymergekit`
|
73 |
+
- `Locutusque/Hercules-6.1-Llama-3.1-8B`
|
74 |
+
- `Sao10K/Llama-3.1-8B-Stheno-v3.4`
|