AlekseiPravdin commited on
Commit
335021f
1 Parent(s): 27ac2cd

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +19 -47
README.md CHANGED
@@ -10,48 +10,11 @@ tags:
10
 
11
  # KukulStanta-7B-Seamaiiza-7B-v1-slerp-merge
12
 
13
- KukulStanta-7B-Seamaiiza-7B-v1-slerp-merge is a merge of the following models using [mergekit](https://github.com/cg123/mergekit):
14
- * [Nitral-AI/KukulStanta-7B](https://huggingface.co/Nitral-AI/KukulStanta-7B)
15
- * [AlekseiPravdin/Seamaiiza-7B-v1](https://huggingface.co/AlekseiPravdin/Seamaiiza-7B-v1)
16
-
17
- ## 🧩 Merge Configuration
18
-
19
- ```yaml
20
- slices:
21
- - sources:
22
- - model: Nitral-AI/KukulStanta-7B
23
- layer_range: [0, 31]
24
- - model: AlekseiPravdin/Seamaiiza-7B-v1
25
- layer_range: [0, 31]
26
- merge_method: slerp
27
- base_model: Nitral-AI/KukulStanta-7B
28
- parameters:
29
- t:
30
- - filter: self_attn
31
- value: [0, 0.5, 0.3, 0.7, 1]
32
- - filter: mlp
33
- value: [1, 0.5, 0.7, 0.3, 0]
34
- - value: 0.5
35
- dtype: float16
36
- ```
37
-
38
- ---
39
- license: apache-2.0
40
- tags:
41
- - merge
42
- - mergekit
43
- - lazymergekit
44
- - Nitral-AI/KukulStanta-7B
45
- - AlekseiPravdin/Seamaiiza-7B-v1
46
- ---
47
-
48
- # KukulStanta-7B-Seamaiiza-7B-v1-slerp-merge
49
-
50
  KukulStanta-7B-Seamaiiza-7B-v1-slerp-merge is an advanced language model created through a strategic fusion of two distinct models: [Nitral-AI/KukulStanta-7B](https://huggingface.co/Nitral-AI/KukulStanta-7B) and [AlekseiPravdin/Seamaiiza-7B-v1](https://huggingface.co/AlekseiPravdin/Seamaiiza-7B-v1). The merging process was executed using [mergekit](https://github.com/cg123/mergekit), a specialized tool designed for precise model blending to achieve optimal performance and synergy between the merged architectures.
51
 
52
  ## 🧩 Merge Configuration
53
 
54
- The models were merged using the Spherical Linear Interpolation (SLERP) method, which ensures smooth interpolation between the two models across all layers. The base model chosen for this process was Nitral-AI/KukulStanta-7B, with parameters and configurations meticulously adjusted to harness the strengths of both source models.
55
 
56
  **Configuration:**
57
 
@@ -76,19 +39,28 @@ dtype: float16
76
 
77
  ## Model Features
78
 
79
- This fusion model combines the robust generative capabilities of Nitral-AI/KukulStanta-7B with the refined tuning of AlekseiPravdin/Seamaiiza-7B-v1, creating a versatile model suitable for a variety of text generation tasks. Leveraging the strengths of both parent models, KukulStanta-7B-Seamaiiza-7B-v1-slerp-merge provides enhanced context understanding, nuanced text generation, and improved performance across diverse NLP tasks.
80
 
81
  ## Evaluation Results
82
 
83
- ### Nitral-AI/KukulStanta-7B
 
 
 
 
 
 
 
 
 
 
 
 
 
 
84
 
85
- - **AI2 Reasoning Challenge (25-Shot):** 68.43 (normalized accuracy)
86
- - **HellaSwag (10-Shot):** 86.37 (normalized accuracy)
87
- - **MMLU (5-Shot):** 65.00 (accuracy)
88
- - **TruthfulQA (0-shot):** 62.19
89
- - **Winogrande (5-shot):** 80.03 (accuracy)
90
- - **GSM8k (5-shot):** 63.68 (accuracy)
91
 
92
  ## Limitations
93
 
94
- While the merged model benefits from the strengths of both parent models, it may also inherit certain limitations and biases. Users should be aware of potential biases present in the training data of the original models, which could affect the performance and fairness of the merged model in specific applications.
 
10
 
11
  # KukulStanta-7B-Seamaiiza-7B-v1-slerp-merge
12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  KukulStanta-7B-Seamaiiza-7B-v1-slerp-merge is an advanced language model created through a strategic fusion of two distinct models: [Nitral-AI/KukulStanta-7B](https://huggingface.co/Nitral-AI/KukulStanta-7B) and [AlekseiPravdin/Seamaiiza-7B-v1](https://huggingface.co/AlekseiPravdin/Seamaiiza-7B-v1). The merging process was executed using [mergekit](https://github.com/cg123/mergekit), a specialized tool designed for precise model blending to achieve optimal performance and synergy between the merged architectures.
14
 
15
  ## 🧩 Merge Configuration
16
 
17
+ The models were merged using the Spherical Linear Interpolation (SLERP) method, which ensures smooth interpolation between the two models across all layers. The base model chosen for this process was [Nitral-AI/KukulStanta-7B], with parameters and configurations meticulously adjusted to harness the strengths of both source models.
18
 
19
  **Configuration:**
20
 
 
39
 
40
  ## Model Features
41
 
42
+ This fusion model combines the robust generative capabilities of [Nitral-AI/KukulStanta-7B] with the refined tuning of [AlekseiPravdin/Seamaiiza-7B-v1], creating a versatile model suitable for a variety of text generation tasks. Leveraging the strengths of both parent models, KukulStanta-7B-Seamaiiza-7B-v1-slerp-merge provides enhanced context understanding, nuanced text generation, and improved performance across diverse NLP tasks.
43
 
44
  ## Evaluation Results
45
 
46
+ ### KukulStanta-7B
47
+
48
+ The evaluation results for [Nitral-AI/KukulStanta-7B](https://huggingface.co/Nitral-AI/KukulStanta-7B) are as follows:
49
+
50
+ | Metric | Value |
51
+ |---------------------------------|-------|
52
+ | Avg. | 70.95 |
53
+ | AI2 Reasoning Challenge (25-Shot)| 68.43 |
54
+ | HellaSwag (10-Shot) | 86.37 |
55
+ | MMLU (5-Shot) | 65.00 |
56
+ | TruthfulQA (0-shot) | 62.19 |
57
+ | Winogrande (5-shot) | 80.03 |
58
+ | GSM8k (5-shot) | 63.68 |
59
+
60
+ ### Seamaiiza-7B-v1
61
 
62
+ The evaluation results for [AlekseiPravdin/Seamaiiza-7B-v1](https://huggingface.co/AlekseiPravdin/Seamaiiza-7B-v1) are not provided in detail but are expected to complement the performance metrics of KukulStanta-7B, enhancing its capabilities in various text generation tasks.
 
 
 
 
 
63
 
64
  ## Limitations
65
 
66
+ While KukulStanta-7B-Seamaiiza-7B-v1-slerp-merge inherits the strengths of both parent models, it may also carry over some limitations or biases present in them. Users should be aware of potential biases in generated content and the need for careful evaluation in sensitive applications.