Spaetzle-v12-7b

Spaetzle-v12-7b is a merge of the following models using LazyMergekit:

As expected, this is a little bit worse in general English tasks over cstr/spaetzle-v8-7b, but a tiny little bit better on German tasks, at least some: e.g. it reaches an EQ-Bench (de) score of 64.81, but only

Metric Value
Avg. 69.36
AI2 Reasoning Challenge (25-Shot) 65.96
HellaSwag (10-Shot) 86.16
MMLU (5-Shot) 63.48
TruthfulQA (0-shot) 57.84
Winogrande (5-shot) 80.03
GSM8k (5-shot) 62.70
Model AGIEval GPT4All TruthfulQA Bigbench Average
Spaetzle-v12-7b 42.64 74.3 58.44 44.44 54.95

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 24.02 ± 2.69
acc_norm 21.65 ± 2.59
agieval_logiqa_en 0 acc 36.10 ± 1.88
acc_norm 37.63 ± 1.90
agieval_lsat_ar 0 acc 24.35 ± 2.84
acc_norm 23.04 ± 2.78
agieval_lsat_lr 0 acc 48.82 ± 2.22
acc_norm 47.25 ± 2.21
agieval_lsat_rc 0 acc 60.59 ± 2.98
acc_norm 57.99 ± 3.01
agieval_sat_en 0 acc 76.21 ± 2.97
acc_norm 74.76 ± 3.03
agieval_sat_en_without_passage 0 acc 46.60 ± 3.48
acc_norm 45.63 ± 3.48
agieval_sat_math 0 acc 37.27 ± 3.27
acc_norm 33.18 ± 3.18

Average: 42.64%

GPT4All

Task Version Metric Value Stderr
arc_challenge 0 acc 59.13 ± 1.44
acc_norm 61.26 ± 1.42
arc_easy 0 acc 83.67 ± 0.76
acc_norm 80.89 ± 0.81
boolq 1 acc 87.83 ± 0.57
hellaswag 0 acc 66.45 ± 0.47
acc_norm 84.63 ± 0.36
openbookqa 0 acc 37.40 ± 2.17
acc_norm 45.80 ± 2.23
piqa 0 acc 82.15 ± 0.89
acc_norm 83.13 ± 0.87
winogrande 0 acc 76.56 ± 1.19

Average: 74.3%

TruthfulQA

Task Version Metric Value Stderr
truthfulqa_mc 1 mc1 42.59 ± 1.73
mc2 58.44 ± 1.58

Average: 58.44%

Bigbench

Task Version Metric Value Stderr
bigbench_causal_judgement 0 multiple_choice_grade 55.26 ± 3.62
bigbench_date_understanding 0 multiple_choice_grade 64.77 ± 2.49
bigbench_disambiguation_qa 0 multiple_choice_grade 37.60 ± 3.02
bigbench_geometric_shapes 0 multiple_choice_grade 32.31 ± 2.47
exact_str_match 21.45 ± 2.17
bigbench_logical_deduction_five_objects 0 multiple_choice_grade 31.00 ± 2.07
bigbench_logical_deduction_seven_objects 0 multiple_choice_grade 22.43 ± 1.58
bigbench_logical_deduction_three_objects 0 multiple_choice_grade 53.00 ± 2.89
bigbench_movie_recommendation 0 multiple_choice_grade 40.40 ± 2.20
bigbench_navigate 0 multiple_choice_grade 51.30 ± 1.58
bigbench_reasoning_about_colored_objects 0 multiple_choice_grade 68.50 ± 1.04
bigbench_ruin_names 0 multiple_choice_grade 48.66 ± 2.36
bigbench_salient_translation_error_detection 0 multiple_choice_grade 30.36 ± 1.46
bigbench_snarks 0 multiple_choice_grade 70.17 ± 3.41
bigbench_sports_understanding 0 multiple_choice_grade 70.39 ± 1.45
bigbench_temporal_sequences 0 multiple_choice_grade 31.00 ± 1.46
bigbench_tracking_shuffled_objects_five_objects 0 multiple_choice_grade 21.44 ± 1.16
bigbench_tracking_shuffled_objects_seven_objects 0 multiple_choice_grade 18.29 ± 0.92
bigbench_tracking_shuffled_objects_three_objects 0 multiple_choice_grade 53.00 ± 2.89

Average: 44.44%

Average score: 54.95%

Elapsed time: 02:50:51

🧩 Configuration

models:
  - model: mayflowergmbh/Wiedervereinigung-7b-dpo-laser
    # no parameters necessary for base model
  - model: flemmingmiguel/NeuDist-Ro-7B
    parameters:
      density: 0.60
      weight: 0.30
  - model: Blizado/discolm-mfto-7b-german-v0.1
    parameters:
      density: 0.65
      weight: 0.40
  - model: ResplendentAI/Flora_DPO_7B
    parameters:
      density: 0.6
      weight: 0.3
merge_method: dare_ties
base_model: mayflowergmbh/Wiedervereinigung-7b-dpo-laser
parameters:
  int8_mask: true
dtype: bfloat16
random_seed: 0
tokenizer_source: base

💻 Usage

!pip install -qU transformers accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "cstr/Spaetzle-v12-7b"
messages = [{"role": "user", "content": "What is a large language model?"}]

tokenizer = AutoTokenizer.from_pretrained(model)
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
Downloads last month
73
Safetensors
Model size
7.24B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for cstr/Spaetzle-v12-7b

Collection including cstr/Spaetzle-v12-7b