DavidAU commited on
Commit
1bf5008
·
verified ·
1 Parent(s): 2872643

Delete README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -462
README.md DELETED
@@ -1,462 +0,0 @@
1
- ---
2
- base_model: []
3
- library_name: transformers
4
- tags:
5
- - mergekit
6
- - merge
7
-
8
- ---
9
- # L3.1-Deepseek-LLama-exp40-3
10
-
11
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
12
-
13
- ## Merge Details
14
- ### Merge Method
15
-
16
- This model was merged using the passthrough merge method.
17
-
18
- ### Models Merged
19
-
20
- The following models were included in the merge:
21
- * D:/8b-deepseek
22
-
23
- ### Configuration
24
-
25
- The following YAML configuration was used to produce this model:
26
-
27
- ```yaml
28
- # Six splits plus "end game
29
- # "D" starts at plus .1 VS D/O proj.
30
- # 40 plus.
31
-
32
- slices:
33
- - sources:
34
- - model: D:/8b-deepseek
35
- layer_range: [0, 31]
36
-
37
- # conc layers
38
- # split 1
39
-
40
- - sources:
41
- - model: D:/8b-deepseek
42
- layer_range: [31,32]
43
- parameters:
44
- scale:
45
- - filter: o_proj
46
- value: 0.01
47
- - filter: down_proj
48
- value: 0.01
49
- - value: 0.11
50
- - sources:
51
- - model: D:/8b-deepseek
52
- layer_range: [31,32]
53
- parameters:
54
- scale:
55
- - filter: o_proj
56
- value: 0.02
57
- - filter: down_proj
58
- value: 0.02
59
- - value: 0.12
60
- - sources:
61
- - model: D:/8b-deepseek
62
- layer_range: [31,32]
63
- parameters:
64
- scale:
65
- - filter: o_proj
66
- value: 0.03
67
- - filter: down_proj
68
- value: 0.03
69
- - value: 0.13
70
-
71
- - sources:
72
- - model: D:/8b-deepseek
73
- layer_range: [31,32]
74
- parameters:
75
- scale:
76
- - filter: o_proj
77
- value: 0.04
78
- - filter: down_proj
79
- value: 0.04
80
- - value: 0.61
81
-
82
- # split 2, SURGE D THEN D drop .46, continues @ D .15 (from .13)
83
-
84
- - sources:
85
- - model: D:/8b-deepseek
86
- layer_range: [31,32]
87
- parameters:
88
- scale:
89
- - filter: o_proj
90
- value: 0.05
91
- - filter: down_proj
92
- value: 0.05
93
- - value: 0.15
94
- - sources:
95
- - model: D:/8b-deepseek
96
- layer_range: [31,32]
97
- parameters:
98
- scale:
99
- - filter: o_proj
100
- value: 0.06
101
- - filter: down_proj
102
- value: 0.06
103
- - value: 0.16
104
- - sources:
105
- - model: D:/8b-deepseek
106
- layer_range: [31,32]
107
- parameters:
108
- scale:
109
- - filter: o_proj
110
- value: 0.07
111
- - filter: down_proj
112
- value: 0.07
113
- - value: 0.17
114
- - sources:
115
- - model: D:/8b-deepseek
116
- layer_range: [31,32]
117
- parameters:
118
- scale:
119
- - filter: o_proj
120
- value: 0.08
121
- - filter: down_proj
122
- value: 0.08
123
- - value: 0.41
124
-
125
- # split 3, SURGE D to .41, D drop .21 ... follows .17 previous
126
-
127
- - sources:
128
- - model: D:/8b-deepseek
129
- layer_range: [31,32]
130
- parameters:
131
- scale:
132
- - filter: o_proj
133
- value: 0.09
134
- - filter: down_proj
135
- value: 0.09
136
- - value: 0.19
137
- - sources:
138
- - model: D:/8b-deepseek
139
- layer_range: [31,32]
140
- parameters:
141
- scale:
142
- - filter: o_proj
143
- value: 0.10
144
- - filter: down_proj
145
- value: 0.10
146
- - value: 0.20
147
- - sources:
148
- - model: D:/8b-deepseek
149
- layer_range: [31,32]
150
- parameters:
151
- scale:
152
- - filter: o_proj
153
- value: 0.11
154
- - filter: down_proj
155
- value: 0.11
156
- - value: .22
157
- - sources:
158
- - model: D:/8b-deepseek
159
- layer_range: [31,32]
160
- parameters:
161
- scale:
162
- - filter: o_proj
163
- value: 0.12
164
- - filter: down_proj
165
- value: 0.12
166
- - value: .24
167
- - sources:
168
- - model: D:/8b-deepseek
169
- layer_range: [31,32]
170
- parameters:
171
- scale:
172
- - filter: o_proj
173
- value: 0.13
174
- - filter: down_proj
175
- value: 0.13
176
- - value: .26
177
- - sources:
178
- - model: D:/8b-deepseek
179
- layer_range: [31,32]
180
- parameters:
181
- scale:
182
- - filter: o_proj
183
- value: 0.14
184
- - filter: down_proj
185
- value: 0.14
186
- - value: .28
187
- - sources:
188
- - model: D:/8b-deepseek
189
- layer_range: [31,32]
190
- parameters:
191
- scale:
192
- - filter: o_proj
193
- value: 0.15
194
- - filter: down_proj
195
- value: 0.15
196
- - value: .30
197
- - sources:
198
- - model: D:/8b-deepseek
199
- layer_range: [31,32]
200
- parameters:
201
- scale:
202
- - filter: o_proj
203
- value: 0.16
204
- - filter: down_proj
205
- value: 0.16
206
- - value: .31
207
- - sources:
208
- - model: D:/8b-deepseek
209
- layer_range: [31,32]
210
- parameters:
211
- scale:
212
- - filter: o_proj
213
- value: 0.20
214
- - filter: down_proj
215
- value: 0.20
216
- - value: .32
217
- - sources:
218
- - model: D:/8b-deepseek
219
- layer_range: [31,32]
220
- parameters:
221
- scale:
222
- - filter: o_proj
223
- value: 0.21
224
- - filter: down_proj
225
- value: 0.21
226
- - value: .33
227
- - sources:
228
- - model: D:/8b-deepseek
229
- layer_range: [31,32]
230
- parameters:
231
- scale:
232
- - filter: o_proj
233
- value: 0.22
234
- - filter: down_proj
235
- value: 0.22
236
- - value: .34
237
- - sources:
238
- - model: D:/8b-deepseek
239
- layer_range: [31,32]
240
- parameters:
241
- scale:
242
- - filter: o_proj
243
- value: 0.23
244
- - filter: down_proj
245
- value: 0.23
246
- - value: .35
247
-
248
- # split 4 , NO SURGE D, "D" down drop of .24 ; reverts to .11 (the very first "D" setting )
249
-
250
- - sources:
251
- - model: D:/8b-deepseek
252
- layer_range: [31,32]
253
- parameters:
254
- scale:
255
- - filter: o_proj
256
- value: 0.24
257
- - filter: down_proj
258
- value: 0.24
259
- - value: 0.11
260
- - sources:
261
- - model: D:/8b-deepseek
262
- layer_range: [31,32]
263
- parameters:
264
- scale:
265
- - filter: o_proj
266
- value: 0.241
267
- - filter: down_proj
268
- value: 0.241
269
- - value: 0.12
270
- - sources:
271
- - model: D:/8b-deepseek
272
- layer_range: [31,32]
273
- parameters:
274
- scale:
275
- - filter: o_proj
276
- value: 0.242
277
- - filter: down_proj
278
- value: 0.243
279
- - value: 0.13
280
- - sources:
281
- - model: D:/8b-deepseek
282
- layer_range: [31,32]
283
- parameters:
284
- scale:
285
- - filter: o_proj
286
- value: 0.244
287
- - filter: down_proj
288
- value: 0.244
289
- - value: 0.61
290
-
291
- # split 5, D Surge to .61, drop to .15 (following .13)
292
-
293
- - sources:
294
- - model: D:/8b-deepseek
295
- layer_range: [31,32]
296
- parameters:
297
- scale:
298
- - filter: o_proj
299
- value: 0.245
300
- - filter: down_proj
301
- value: 0.245
302
- - value: 0.15
303
- - sources:
304
- - model: D:/8b-deepseek
305
- layer_range: [31,32]
306
- parameters:
307
- scale:
308
- - filter: o_proj
309
- value: 0.246
310
- - filter: down_proj
311
- value: 0.246
312
- - value: 0.16
313
- - sources:
314
- - model: D:/8b-deepseek
315
- layer_range: [31,32]
316
- parameters:
317
- scale:
318
- - filter: o_proj
319
- value: 0.247
320
- - filter: down_proj
321
- value: 0.247
322
- - value: 0.17
323
- - sources:
324
- - model: D:/8b-deepseek
325
- layer_range: [31,32]
326
- parameters:
327
- scale:
328
- - filter: o_proj
329
- value: 0.248
330
- - filter: down_proj
331
- value: 0.248
332
- - value: 0.41
333
-
334
- # split 6, D surge to .41 , then follows .17
335
-
336
- - sources:
337
- - model: D:/8b-deepseek
338
- layer_range: [31,32]
339
- parameters:
340
- scale:
341
- - filter: o_proj
342
- value: 0.249
343
- - filter: down_proj
344
- value: 0.249
345
- - value: 0.19
346
- - sources:
347
- - model: D:/8b-deepseek
348
- layer_range: [31,32]
349
- parameters:
350
- scale:
351
- - filter: o_proj
352
- value: 0.250
353
- - filter: down_proj
354
- value: 0.250
355
- - value: 0.20
356
- - sources:
357
- - model: D:/8b-deepseek
358
- layer_range: [31,32]
359
- parameters:
360
- scale:
361
- - filter: o_proj
362
- value: 0.251
363
- - filter: down_proj
364
- value: 0.251
365
- - value: .22
366
- - sources:
367
- - model: D:/8b-deepseek
368
- layer_range: [31,32]
369
- parameters:
370
- scale:
371
- - filter: o_proj
372
- value: 0.252
373
- - filter: down_proj
374
- value: 0.252
375
- - value: .24
376
- - sources:
377
- - model: D:/8b-deepseek
378
- layer_range: [31,32]
379
- parameters:
380
- scale:
381
- - filter: o_proj
382
- value: 0.253
383
- - filter: down_proj
384
- value: 0.254
385
- - value: .26
386
- - sources:
387
- - model: D:/8b-deepseek
388
- layer_range: [31,32]
389
- parameters:
390
- scale:
391
- - filter: o_proj
392
- value: 0.255
393
- - filter: down_proj
394
- value: 0.255
395
- - value: .28
396
- - sources:
397
- - model: D:/8b-deepseek
398
- layer_range: [31,32]
399
- parameters:
400
- scale:
401
- - filter: o_proj
402
- value: 0.256
403
- - filter: down_proj
404
- value: 0.256
405
- - value: .30
406
-
407
- # O PROJ, DPROJ to .3333 /
408
- # end game
409
-
410
- - sources:
411
- - model: D:/8b-deepseek
412
- layer_range: [31,32]
413
- parameters:
414
- scale:
415
- - filter: o_proj
416
- value: 0.3333333333333
417
- - filter: down_proj
418
- value: 0.3333333333333
419
- - value: 0.3333333333333
420
- - sources:
421
- - model: D:/8b-deepseek
422
- layer_range: [31,32]
423
- parameters:
424
- scale:
425
- - filter: o_proj
426
- value: 0.4444444444444
427
- - filter: down_proj
428
- value: 0.4444444444444
429
- - value: 0.4444444444444
430
- - sources:
431
- - model: D:/8b-deepseek
432
- layer_range: [31,32]
433
- parameters:
434
- scale:
435
- - filter: o_proj
436
- value: 0.5555555555555
437
- - filter: down_proj
438
- value: 0.5555555555555
439
- - value: 0.5555555555555
440
- - sources:
441
- - model: D:/8b-deepseek
442
- layer_range: [31,32]
443
- parameters:
444
- scale:
445
- - filter: o_proj
446
- value: 0.6666666666666
447
- - filter: down_proj
448
- value: 0.6666666666666
449
- - value: 0.6666666666666
450
- - sources:
451
- - model: D:/8b-deepseek
452
- layer_range: [31,32]
453
- parameters:
454
- scale:
455
- - filter: o_proj
456
- value: 0.777777777777
457
- - filter: down_proj
458
- value: 0.777777777777
459
- - value: 0.888888888888
460
- merge_method: passthrough
461
- dtype: float16
462
- ```