TeeZee commited on
Commit
c37eba1
·
verified ·
1 Parent(s): 6e10dc1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +91 -91
README.md CHANGED
@@ -1,91 +1,91 @@
1
- ---
2
- base_model: []
3
- library_name: transformers
4
- tags:
5
- - mergekit
6
- - merge
7
-
8
- ---
9
- # 2x_bagel-34b-v0.2
10
-
11
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
12
-
13
- ## Merge Details
14
- ### Merge Method
15
-
16
- This model was merged using the passthrough merge method.
17
-
18
- ### Models Merged
19
-
20
- The following models were included in the merge:
21
- * ./jondurbin_bagel-34b-v0.2
22
-
23
- ### Configuration
24
-
25
- The following YAML configuration was used to produce this model:
26
-
27
- ```yaml
28
- dtype: float32
29
- merge_method: passthrough
30
- slices:
31
- - sources:
32
- - layer_range: [0, 20]
33
- model: "jondurbin_bagel-34b-v0.2"
34
- parameters:
35
- scale:
36
- - filter: q_proj
37
- value: 0.7071067812
38
- - filter: k_proj
39
- value: 0.7071067812
40
- - value: 1
41
- - sources:
42
- - layer_range: [10, 30]
43
- model: "jondurbin_bagel-34b-v0.2"
44
- parameters:
45
- scale:
46
- - filter: q_proj
47
- value: 0.7071067812
48
- - filter: k_proj
49
- value: 0.7071067812
50
- - value: 1
51
- - sources:
52
- - layer_range: [20, 40]
53
- model: "jondurbin_bagel-34b-v0.2"
54
- parameters:
55
- scale:
56
- - filter: q_proj
57
- value: 0.7071067812
58
- - filter: k_proj
59
- value: 0.7071067812
60
- - value: 1
61
- - sources:
62
- - layer_range: [30, 50]
63
- model: "jondurbin_bagel-34b-v0.2"
64
- parameters:
65
- scale:
66
- - filter: q_proj
67
- value: 0.7071067812
68
- - filter: k_proj
69
- value: 0.7071067812
70
- - value: 1
71
- - sources:
72
- - layer_range: [40, 60]
73
- model: "jondurbin_bagel-34b-v0.2"
74
- parameters:
75
- scale:
76
- - filter: q_proj
77
- value: 0.7071067812
78
- - filter: k_proj
79
- value: 0.7071067812
80
- - value: 1
81
- name: 2xbagel_fp32
82
- ---
83
- dtype: bfloat16
84
- merge_method: passthrough
85
- slices:
86
- - sources:
87
- - layer_range: [0, 100]
88
- model: 2xbagel_fp32
89
- name: bagel_new
90
-
91
- ```
 
1
+ ---
2
+ base_model: []
3
+ library_name: transformers
4
+ tags:
5
+ - mergekit
6
+ - merge
7
+ license: apache-2.0
8
+ ---
9
+ # DoubleBagel-57B-v1.0
10
+
11
+ This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
12
+
13
+ ## Merge Details
14
+ ### Merge Method
15
+
16
+ This model was merged using the passthrough merge method.
17
+
18
+ ### Models Merged
19
+
20
+ The following models were included in the merge:
21
+ * ./jondurbin_bagel-34b-v0.2
22
+
23
+ ### Configuration
24
+
25
+ The following YAML configuration was used to produce this model:
26
+
27
+ ```yaml
28
+ dtype: float32
29
+ merge_method: passthrough
30
+ slices:
31
+ - sources:
32
+ - layer_range: [0, 20]
33
+ model: "jondurbin_bagel-34b-v0.2"
34
+ parameters:
35
+ scale:
36
+ - filter: q_proj
37
+ value: 0.7071067812
38
+ - filter: k_proj
39
+ value: 0.7071067812
40
+ - value: 1
41
+ - sources:
42
+ - layer_range: [10, 30]
43
+ model: "jondurbin_bagel-34b-v0.2"
44
+ parameters:
45
+ scale:
46
+ - filter: q_proj
47
+ value: 0.7071067812
48
+ - filter: k_proj
49
+ value: 0.7071067812
50
+ - value: 1
51
+ - sources:
52
+ - layer_range: [20, 40]
53
+ model: "jondurbin_bagel-34b-v0.2"
54
+ parameters:
55
+ scale:
56
+ - filter: q_proj
57
+ value: 0.7071067812
58
+ - filter: k_proj
59
+ value: 0.7071067812
60
+ - value: 1
61
+ - sources:
62
+ - layer_range: [30, 50]
63
+ model: "jondurbin_bagel-34b-v0.2"
64
+ parameters:
65
+ scale:
66
+ - filter: q_proj
67
+ value: 0.7071067812
68
+ - filter: k_proj
69
+ value: 0.7071067812
70
+ - value: 1
71
+ - sources:
72
+ - layer_range: [40, 60]
73
+ model: "jondurbin_bagel-34b-v0.2"
74
+ parameters:
75
+ scale:
76
+ - filter: q_proj
77
+ value: 0.7071067812
78
+ - filter: k_proj
79
+ value: 0.7071067812
80
+ - value: 1
81
+ name: 2xbagel_fp32
82
+ ---
83
+ dtype: bfloat16
84
+ merge_method: passthrough
85
+ slices:
86
+ - sources:
87
+ - layer_range: [0, 100]
88
+ model: 2xbagel_fp32
89
+ name: bagel_new
90
+
91
+ ```