--- library_name: transformers tags: - mergekit - merge license: apache-2.0 base_model: - Qwen/Qwen2.5-7B - jeffmeloy/Qwen2.5-7B-nerd-uncensored-v1.0 - jeffmeloy/Qwen2.5-7B-nerd-uncensored-v1.8 - jeffmeloy/Qwen2.5-7B-nerd-uncensored-v0.9 - jeffmeloy/jeffmeloy_Qwen2.5-7B-minperplexity-1 - fblgit/cybertron-v4-qw7B-UNAMGS - sethuiyer/Qwen2.5-7B-Anvita language: - en --- ![Kythera.webp](https://huggingface.co/sometimesanotion/KytheraMix-7B-v0.2/resolve/main/Kythera.webp) --- KytheraMix-7B is crafted using semi-automated merges YAML templates. Like AgoraMix, two merge trees converge: one for instruction following, one and for reason, and a SLERP merge blends them with a gradient. ### Ancestor Models - **[newsbang/Homer-v0.5-Qwen2.5-7B](http://huggingface.co/newsbang/Homer-v0.5-Qwen2.5-7B)** - For now, this is the sole contributor to the instruction-following side of NestorMix. - **[sethuiyer/Qwen2.5-7B-Anvita](http://huggingface.co/sethuiyer/Qwen2.5-7B-Anvita)** - Itself well-rounded for instruction following and reasoning - **[jeffmeloy/jeffmeloy_Qwen2.5-7B-minperplexity-1](http://huggingface.co/jeffmeloy/jeffmeloy_Qwen2.5-7B-minperplexity-1)** - Strong knowledge and recall, thanks to the composition of the layers from many models with the least perplexity. - **jeffmeloy/Qwen2.5-7B-nerd-uncensored-ties** - A model_stock and TIES merge of [jeffmeloy/Qwen2.5-7B-nerd-uncensored-v0.9](http://huggingface.co/jeffmeloy/Qwen2.5-7B-nerd-uncensored-v0.9), [jeffmeloy/Qwen2.5-7B-nerd-uncensored-v1.0](http://huggingface.co/jeffmeloy/Qwen2.5-7B-nerd-uncensored-v1.0), and [jeffmeloy/Qwen2.5-7B-nerd-uncensored-v1.8](http://huggingface.co/jeffmeloy/Qwen2.5-7B-nerd-uncensored-v1.8). These models are themselves the product of ner_merge, which chooses select layers from many other merges. - **[fblgit/cybertron-v4-qw7B-UNAMGS](http://huggingface.co/fblgit/cybertron-v4-qw7B-UNAMGS)** - Strong coding and knowledge representation ### Models Merged The following YAML configuration was used to produce this model: ```yaml name: agoramix-7b-if-della # This contributes insruction following merge_method: della base_model: Qwen/Qwen2.5-7B tokenizer_source: base parameters: int8_mask: false normalize: true rescale: false density: 0.30 weight: 0.50 epsilon: 0.09 lambda: 0.95 models: - model: newsbang/Homer-v0.5-Qwen2.5-7B # Exceptional instruction following, coding, math parameters: density: 0.80 weight: 1.00 - model: sethuiyer/Qwen2.5-7B-Anvita # Good instruction following, combined with exceptional recall and reasoning parameters: density: 0.30 weight: [ 0.00, 0.40, 0.40, 0.40, 0.40, 0.40, 0.40, 0.40, 0.40, 0.40, 0.40, 0.40, 0.30, 0.30 ] dtype: bfloat16 out_dtype: bfloat16 --- name: agoramix-7b-reason-della # This contributes the knowledge and reasoning pool, later to be merged merge_method: della # with the dominant instruction-following model base_model: Qwen/Qwen2.5-7B tokenizer_source: base parameters: int8_mask: false normalize: true rescale: false density: 0.30 weight: 0.50 epsilon: 0.08 lambda: 0.95 models: - model: jeffmeloy/Qwen2.5-7B-nerd-uncensored-ties/ parameters: density: 0.80 weight: 1.00 - model: jeffmeloy/jeffmeloy_Qwen2.5-7B-minperplexity-1 parameters: density: 0.60 weight: 0.50 - model: fblgit/cybertron-v4-qw7B-UNAMGS parameters: density: 0.40 weight: 0.50 dtype: bfloat16 out_dtype: bfloat16 name: agoramix-7b-finalize-slerp merge_method: slerp base_model: newsbang/Homer-v0.5-Qwen2.5-7B # Excellent instruction-following tokenizer_source: base parameters: t: [ 0.00, 0.50, 0.60, 0.70, 0.70, 0.70, 0.70, 0.55, 0.40 ] slices: - sources: - layer_range: [ 0, 2 ] model: newsbang/Homer-v0.5-Qwen2.5-7B - layer_range: [ 0, 2 ] model: merges/agoramix-7b-reason-della t: [ 0.00, 0.00 ] - sources: - layer_range: [ 2, 7 ] model: newsbang/Homer-v0.5-Qwen2.5-7B - layer_range: [ 2, 7 ] model: merges/agoramix-7b-reason-della t: [ 0.00, 0.50 ] - sources: - layer_range: [ 7, 14 ] model: newsbang/Homer-v0.5-Qwen2.5-7B - layer_range: [ 7, 14 ] model: merges/agoramix-7b-reason-della t: [ 0.50, 0.70 ] - sources: - layer_range: [ 14, 21 ] model: newsbang/Homer-v0.5-Qwen2.5-7B - layer_range: [ 14, 21 ] model: merges/agoramix-7b-reason-della t: [ 0.70, 0.70 ] - sources: - layer_range: [ 21, 28 ] model: newsbang/Homer-v0.5-Qwen2.5-7B - layer_range: [ 21, 28 ] model: merges/agoramix-7b-reason-della t: [ 0.70, 0.40 ] dtype: bfloat16 out_dtype: bfloat16 name: agoramix-7b-finalize merge_method: ties base_model: Qwen/Qwen2.5-7B tokenizer_source: huihui-ai/Qwen2.5-7B-Instruct-abliterated-v2 parameters: int8_mask: false normalize: true rescale: false density: 1.00 weight: 1.00 models: - model: merges/agoramix-7b-finalize-slerp dtype: bfloat16 out_dtype: bfloat16 ```