metadata

license: cc-by-nc-4.0
tags:
  - not-for-all-audiences
  - merge
model-index:
  - name: Buttocks-7B-v1.1
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 54.61
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TeeZee/Buttocks-7B-v1.1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 75.61
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TeeZee/Buttocks-7B-v1.1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 50.22
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TeeZee/Buttocks-7B-v1.1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 44.72
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TeeZee/Buttocks-7B-v1.1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 68.9
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TeeZee/Buttocks-7B-v1.1
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 5.76
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=TeeZee/Buttocks-7B-v1.1
          name: Open LLM Leaderboard

Buttocks 7B v1.1

An experiment that has gone very, very wrong.

Model details

Recreation of the original recipe for Undi95/Toppy-M-7B, but instead of final merge done by mergekit, MergeMoster was used with extended RPG preset.
recipe in mergekit-config, stepsAA, BB, CC are the original models with LORAS as per Toppy M 7B sauce.
LERP merge method was used

Results

in simple terms this model is totally unhinged
it always produces sequences similar to fever dreams or drug trips
on a good day it can produce scenarios similar to old Monty Python sketches
models shows incredible affinity to words like 'ass', 'buttocks', 'farts', prompting with those single words will probably produce a whole story revolving around those topics.

Possible uses

to generate dream sequence in a story
to make the boring model more unpredictable by merging at low weights with this monster
to take a break, connect Silly Tavern to this model and get a few ROTFLs observing how every story deteriorates into pure craziness
research on LLM hallucinations

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	49.97
AI2 Reasoning Challenge (25-Shot)	54.61
HellaSwag (10-Shot)	75.61
MMLU (5-Shot)	50.22
TruthfulQA (0-shot)	44.72
Winogrande (5-shot)	68.90
GSM8k (5-shot)	5.76