README.md · Nohobby/MS-Schisandra-22B-v0.2 at 1e205b24e88a905c78a07fa170f3f5fa7fafbc5d

metadata

base_model:
  - unsloth/Mistral-Small-Instruct-2409
  - Gryphe/Pantheon-RP-Pure-1.6.2-22b-Small
  - anthracite-org/magnum-v4-22b
  - ArliAI/Mistral-Small-22B-ArliAI-RPMax-v1.1
  - spow12/ChatWaifu_v2.0_22B
  - rAIfle/Acolyte-22B
  - Envoid/Mistral-Small-NovusKyver
  - InferenceIllusionist/SorcererLM-22B
  - allura-org/MS-Meadowlark-22B
  - crestf411/MS-sunfall-v0.7.0
library_name: transformers
tags:
  - mergekit
  - merge
license: other
language:
  - en

Schisandra

Many thanks to the authors of the models used!

Overview

Main uses: RP, Storywriting

An intelligent model that is attentive to details and has a low-slop writing style. This time with a stable tokenizer.

Oh, and it now contains one more finetune! Not sure if some of them actually contribute to the output, but it's nice to see the numbers growing.

Quants

GGUF: Static | Imatrix

exl2: 4.65bpw 5.5bpw 6.5bpw

Settings

Prompt format: Mistral-V3 or this

Samplers: These or These

Merge Details

Merging steps

Step1

(Config partially taken from here

base_model: spow12/ChatWaifu_v2.0_22B
parameters:
  int8_mask: true
  rescale: true
  normalize: false
dtype: bfloat16
tokenizer_source: base
merge_method: della
models:
  - model: Envoid/Mistral-Small-NovusKyver
    parameters:
      density: [0.35, 0.65, 0.5, 0.65, 0.35]
      epsilon: [0.1, 0.1, 0.25, 0.1, 0.1]
      lambda: 0.85
      weight: [-0.01891, 0.01554, -0.01325, 0.01791, -0.01458]
  - model: rAIfle/Acolyte-22B
    parameters:
      density: [0.6, 0.4, 0.5, 0.4, 0.6]
      epsilon: [0.1, 0.1, 0.25, 0.1, 0.1]
      lambda: 0.85
      weight: [0.01847, -0.01468, 0.01503, -0.01822, 0.01459]

Step2

(Config partially taken from here)

base_model: InferenceIllusionist/SorcererLM-22B
parameters:
  int8_mask: true
  rescale: true
  normalize: false
dtype: bfloat16
tokenizer_source: base
merge_method: della
models:
  - model: crestf411/MS-sunfall-v0.7.0
    parameters:
      density: [0.35, 0.65, 0.5, 0.65, 0.35]
      epsilon: [0.1, 0.1, 0.25, 0.1, 0.1]
      lambda: 0.85
      weight: [-0.01891, 0.01554, -0.01325, 0.01791, -0.01458]
  - model: anthracite-org/magnum-v4-22b
    parameters:
      density: [0.6, 0.4, 0.5, 0.4, 0.6]
      epsilon: [0.1, 0.1, 0.25, 0.1, 0.1]
      lambda: 0.85
      weight: [0.01847, -0.01468, 0.01503, -0.01822, 0.01459]

SchisandraVA2

(Config taken from here)

merge_method: della_linear
dtype: bfloat16
parameters:
  normalize: true
  int8_mask: true
tokenizer_source: base
base_model: TheDrummer/UnslopSmall-22B-v1
models:
    - model: ArliAI/Mistral-Small-22B-ArliAI-RPMax-v1.1
      parameters:
        density: 0.55
        weight: 1
    - model: Gryphe/Pantheon-RP-Pure-1.6.2-22b-Small
      parameters:
        density: 0.55
        weight: 1
    - model: Step1
      parameters:
        density: 0.55
        weight: 1
    - model: allura-org/MS-Meadowlark-22B
      parameters:
        density: 0.55
        weight: 1
    - model: Step2
      parameters:
        density: 0.55
        weight: 1

Schisandra-v0.2

dtype: bfloat16
tokenizer_source: base
merge_method: della_linear
parameters:
  density: 0.5
base_model: SchisandraVA2
models:
  - model: unsloth/Mistral-Small-Instruct-2409
    parameters:
      weight:
        - filter: v_proj
          value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
        - filter: o_proj
          value: [1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1]
        - filter: up_proj
          value: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
        - filter: gate_proj
          value: [0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0]
        - filter: down_proj
          value: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
        - value: 0
  - model: SchisandraVA2
    parameters:
      weight:
        - filter: v_proj
          value: [1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1]
        - filter: o_proj
          value: [0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0]
        - filter: up_proj
          value: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
        - filter: gate_proj
          value: [1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1]
        - filter: down_proj
          value: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
        - value: 1