ZeroXClem commited on
Commit
c162c00
1 Parent(s): 7d023b9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -9
README.md CHANGED
@@ -6,15 +6,25 @@ tags:
6
  - lazymergekit
7
  - Locutusque/Hercules-6.1-Llama-3.1-8B
8
  - Sao10K/Llama-3.1-8B-Stheno-v3.4
 
 
9
  ---
 
10
 
11
  # ZeroXClem/Stheno-Hercules-3.1-8B
12
 
13
- ZeroXClem/Stheno-Hercules-3.1-8B is a merge of the following models using [mergekit](https://github.com/cg123/mergekit):
14
- * [Locutusque/Hercules-6.1-Llama-3.1-8B](https://huggingface.co/Locutusque/Hercules-6.1-Llama-3.1-8B)
15
- * [Sao10K/Llama-3.1-8B-Stheno-v3.4](https://huggingface.co/Sao10K/Llama-3.1-8B-Stheno-v3.4)
 
 
 
 
 
16
 
17
- ## 🧩 Configuration
 
 
18
 
19
  ```yaml
20
  slices:
@@ -28,10 +38,37 @@ base_model: Locutusque/Hercules-6.1-Llama-3.1-8B
28
  parameters:
29
  t:
30
  - filter: self_attn
31
- value: [0, 0.5, 0.3, 0.7, 1]
32
  - filter: mlp
33
- value: [1, 0.5, 0.7, 0.3, 0]
34
- - value: 0.5
35
- dtype: bfloat16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
 
37
- ```
 
 
 
 
 
6
  - lazymergekit
7
  - Locutusque/Hercules-6.1-Llama-3.1-8B
8
  - Sao10K/Llama-3.1-8B-Stheno-v3.4
9
+ base_model:
10
+ - Locutusque/Hercules-6.1-Llama-3.1-8B
11
  ---
12
+ README.md
13
 
14
  # ZeroXClem/Stheno-Hercules-3.1-8B
15
 
16
+ ZeroXClem/Stheno-Hercules-3.1-8B is an advanced model merge, combining the strengths of two state-of-the-art models using the powerful [mergekit](https://github.com/cg123/mergekit) framework. This model is designed to maximize performance by blending different architecture layers and leveraging cutting-edge interpolation techniques, bringing together the best of both worlds: **Hercules** and **Stheno**.
17
+
18
+ ## 🚀 Merged Models
19
+
20
+ This model merge incorporates the following:
21
+
22
+ - [**Locutusque/Hercules-6.1-Llama-3.1-8B**](https://huggingface.co/Locutusque/Hercules-6.1-Llama-3.1-8B): Known for its powerful attention mechanisms and deep neural layers, Hercules-6.1 serves as the base for this merge.
23
+ - [**Sao10K/Llama-3.1-8B-Stheno-v3.4**](https://huggingface.co/Sao10K/Llama-3.1-8B-Stheno-v3.4): Complementing Hercules, Stheno-v3.4 contributes its refined, balanced network architecture for added depth and flexibility.
24
 
25
+ ## 🧩 Merge Configuration
26
+
27
+ The configuration below outlines how the models are merged using **spherical linear interpolation (SLERP)**, which allows for smooth transitions between the layers of both models, ensuring an optimal blend of their unique attributes:
28
 
29
  ```yaml
30
  slices:
 
38
  parameters:
39
  t:
40
  - filter: self_attn
41
+ value: [0, 0.5, 0.3, 0.7, 1] # Controls the blending of self-attention layers
42
  - filter: mlp
43
+ value: [1, 0.5, 0.7, 0.3, 0] # Adjusts the blending across the MLP layers
44
+ - value: 0.5 # Global merge weight for layers not specified by filters
45
+ dtype: bfloat16 # Optimized for efficiency and performance
46
+
47
+ ```
48
+
49
+ ### Key Parameters
50
+
51
+ - **Self-Attention Filtering** (`self_attn`): Controls the extent of blending across self-attention layers, ranging from full to partial utilization from both models at various levels.
52
+ - **MLP Filtering** (`mlp`): Similar to self-attention, this filter applies to the Multi-Layer Perceptrons, fine-tuning the neural network’s layer balance.
53
+ - **Global Weight (`t.value`)**: A general interpolation factor for all layers not explicitly defined by the filters, set at 0.5 for an equal contribution from both models.
54
+ - **Data Type (`dtype`)**: Uses `bfloat16` to maintain computational efficiency while ensuring a high level of precision.
55
+
56
+ ## 🎯 Use Case & Applications
57
+
58
+ **ZeroXClem/Stheno-Hercules-3.1-8B** is where **imagination meets intelligence**, a model built to seamlessly weave together the **art of roleplay** and the **precision of science**. With the raw power of Hercules fueling your creations and Stheno’s delicate balance guiding every interaction, this model thrives in:
59
+
60
+ - **Immersive storytelling and dynamic roleplaying**: Craft rich, believable characters and worlds with unparalleled depth, emotional nuance, and narrative flow.
61
+ - **Scientific exploration and discovery**: Unleash your mind’s full potential for complex problem-solving, hypothesis testing, and advanced AI-driven research.
62
+ - **Blending creativity and logic**: A harmonious fusion of heart and intellect, this model handles anything from playful creativity to rigorous scientific applications.
63
+
64
+ ## 📜 License
65
+
66
+ This model is open-sourced under the **Apache-2.0 License**.
67
+
68
+ ## 💡 Tags
69
 
70
+ - `merge`
71
+ - `mergekit`
72
+ - `lazymergekit`
73
+ - `Locutusque/Hercules-6.1-Llama-3.1-8B`
74
+ - `Sao10K/Llama-3.1-8B-Stheno-v3.4`