metadata
base_model:
- sometimesanotion/Lamarck-14B-v0.6
- sometimesanotion/Qwen2.5-14B-Vimarckoso-v3
library_name: transformers
tags:
- mergekit
- merge
license: apache-2.0
Notes
This is an experiment to try these extra SLERP parameters @bamec66557 uses in bamec66557/Qwen-2.5-14B-MINUS, but with the models I'm working on now. Do they make a difference to mergekit-gui? We'll see.
Models Merged
The following models were included in the merge:
Configuration
The following YAML configuration was used to produce this model:
models:
- model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3
- model: sometimesanotion/Lamarck-14B-v0.6
merge_method: slerp
base_model: sometimesanotion/Qwen2.5-14B-Vimarckoso-v3
dtype: float32
out_dtype: bfloat16
parameters:
t: [0.3, 0.6, 0.8, 0.6, 0.3]
regularization:
- method: gradient_penalty
scale: 0.07
- method: weight_clipping
clip_range: [-0.2, 0.2]
- method: random_noise
scale: 0.005
- method: attention_dropout
scale: 0.03
postprocessing:
- operation: entropy_regularization
scale: 0.07
- operation: non_linear_scaling
parameters:
function: gelu
- operation: sharpening
intensity: 0.7
- operation: gaussian_smoothing
sigma: 0.2
- operation: normalize
- operation: dynamic_scaling
scale_range: [0.97, 1.03]
- operation: smoothing
parameters:
adaptive: true
range: [0.97, 1.03]
kernel_size: 5