sometimesanotion
/

KytheraMix-7B-v0.2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

sometimesanotion commited on Nov 29, 2024

Commit

f7517a7

·

verified ·

1 Parent(s): 20240c7

Update README.md

Files changed (1) hide show

README.md +24 -0

README.md CHANGED Viewed

@@ -38,6 +38,30 @@ KytheraMix-7B is crafted using semi-automated merges YAML templates. Like AgoraM
 The following YAML configuration was used to produce this model:
 ```yaml
 name:                agoramix-7b-reason-della                  # This contributes the knowledge and reasoning pool, later to be merged
 merge_method:        della                                      # with the dominant instruction-following model
 base_model:          Qwen/Qwen2.5-7B

 The following YAML configuration was used to produce this model:
 ```yaml
+name:                agoramix-7b-if-della                      # This contributes insruction following
+merge_method:        della
+base_model:          Qwen/Qwen2.5-7B
+tokenizer_source:    base
+parameters:
+  int8_mask:         false
+  normalize:         true
+  rescale:           false
+  density:           0.30
+  weight:            0.50
+  epsilon:           0.09
+  lambda:            0.95
+models:
+  - model:           newsbang/Homer-v0.5-Qwen2.5-7B            # Exceptional instruction following, coding, math
+    parameters:
+      density:       0.80
+      weight:        1.00
+  - model:           sethuiyer/Qwen2.5-7B-Anvita               # Good instruction following, combined with exceptional recall and reasoning
+    parameters:
+      density:       0.30
+      weight:        [ 0.00, 0.40, 0.40, 0.40, 0.40, 0.40, 0.40, 0.40, 0.40, 0.40, 0.40, 0.40, 0.30, 0.30 ]
+dtype:               bfloat16
+out_dtype:           bfloat16
+---
 name:                agoramix-7b-reason-della                  # This contributes the knowledge and reasoning pool, later to be merged
 merge_method:        della                                      # with the dominant instruction-following model
 base_model:          Qwen/Qwen2.5-7B