Imatrix compressions of FP Merge of "D_AU-Orac-13B-Tiefighter-slerp".

"Imatrix Plus" is an upgraded form of Imatrix which using full precision for specific parts of the compression. As a result all compressions will be slightly larger in size than standard 13B compressions.

This method results in a higher quality model, especially at lower compressions. This method is applied across all compressions from IQ1 to Q8.

Even IQ1_S - the most compressed verison - works well, however IQ4/Q4 are suggested as minimums for quality. Highest quality will be Q6/Q8.

How big a difference is this merge?

Orginal Tiefighter IQ1_S (with imatrix enhancements) tested at a perplexity of: PPL = 17.2589 +/- 0.12466*

Tiefighter Orca 2 IQ1_S (with imatrix enhancements) tested at a perplexity of: PPL = 12.6985 +/- 0.09106*

Note that LOWER perplexity is better.

  • Tested using llamacpp, perplexity.exe with wiki.raw.

In addition the Imatrix file used to "fix" the compressed files post compression resulted in over 2 whole points lower perplexity at IQ1_S vs some of the other "Imatrix" files currently in use.

Orginal Tiefighter IQ1_S (with imatrix enhancements) tested with a different "Imatrix" repair file at a perplexity of: PPL = 19.6355 +/- 0.14435

Likewise the merge itself affected perplexity too.

This merge was an experiment to test already established Roleplay, Fiction and Story generation of "Tiefighter" with a some of "Orca 2"'s qualities.

Additional merge experiements are in progress.

For Imatrix plus this was a test of high precision in specific areas of the model leading to a slightly larger compressed file. In addition the Imatrix process itself used a larger "calibration" file than standard to further enhance quality.

The process added appoximately 310 MB to each compressed file.

A blank or standard Alpaca Template for text generation will work. Currently "CHATML" is untested.

Context length: 4096.

Please see the orginal model card for specific details of use, additional credits and tips:

KoboldAI/LLaMA2-13B-Tiefighter

merge

This is a merge of pre-trained language models created using mergekit.

Merge Details

Merge Method

This model was merged using the SLERP merge method.

Models Merged

The following models were included in the merge:

Configuration

The following YAML configuration was used to produce this model:

slices:
  - sources:
      - model: KoboldAI/LLaMA2-13B-Tiefighter
        layer_range: [0, 40]
      - model: microsoft/Orca-2-13b
        layer_range: [0, 40]
merge_method: slerp
base_model: microsoft/Orca-2-13b
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5
dtype: bfloat16
Downloads last month
31
GGUF
Model size
13B params
Architecture
llama

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Examples
Unable to determine this model's library. Check the docs .

Collection including DavidAU/D_AU-Orac-13B-Tiefighter-slerp-imat-plus-GGUF