CapyLake-7B-v2-laser

This model is a finetune of cognitivecomputations/WestLake-7B-v2-Laser on argilla/distilabel-capybara-dpo-7k-binarized

image/webp

Built with Distilabel

Process

  • Realigned the chat template to ChatML
  • Completed 1 Epoch
  • 5e-05 learning rate
  • Training time was about 2 hours on 1 H100
  • Cost was ~$8

Code Example

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "macadeliccc/CapyLake-7B-v2-laser"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

text = "Create an idea for a TV show and write a short pilot script"
inputs = tokenizer(text, return_tensors="pt")

# Adding hyperparameters to the generation call
outputs = model.generate(
    **inputs,
    max_new_tokens=4096,  # Controls the maximum length of the new tokens created
    temperature=0.7,  # Adjust for creativity (lower is less random)
    top_k=50,  # Keeps the top k tokens for sampling
    top_p=0.95,  # Uses nucleus sampling with this cumulative probability
    num_return_sequences=1,  # Number of sequences to generate
    no_repeat_ngram_size=2,  # Prevents repeating n-grams to ensure diversity
    early_stopping=True  # Stops generation when all sequences reach the EOS token
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Other Capy Models

SOLAR-10.7B-Capy-v1.0 is also on the way. There could be more depending on performance!

Evaluations

Model AGIEval GPT4All TruthfulQA Bigbench Average
CapyLake-7B-v2-laser 44.34 77.77 68.47 47.92 59.62

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 28.35 ± 2.83
acc_norm 25.98 ± 2.76
agieval_logiqa_en 0 acc 38.86 ± 1.91
acc_norm 39.02 ± 1.91
agieval_lsat_ar 0 acc 25.22 ± 2.87
acc_norm 24.35 ± 2.84
agieval_lsat_lr 0 acc 50.39 ± 2.22
acc_norm 51.57 ± 2.22
agieval_lsat_rc 0 acc 65.06 ± 2.91
acc_norm 63.94 ± 2.93
agieval_sat_en 0 acc 78.64 ± 2.86
acc_norm 78.64 ± 2.86
agieval_sat_en_without_passage 0 acc 40.78 ± 3.43
acc_norm 40.78 ± 3.43
agieval_sat_math 0 acc 33.64 ± 3.19
acc_norm 30.45 ± 3.11

Average: 44.34%

GPT4All

Task Version Metric Value Stderr
arc_challenge 0 acc 66.89 ± 1.38
acc_norm 67.49 ± 1.37
arc_easy 0 acc 86.70 ± 0.70
acc_norm 81.90 ± 0.79
boolq 1 acc 88.10 ± 0.57
hellaswag 0 acc 71.45 ± 0.45
acc_norm 87.78 ± 0.33
openbookqa 0 acc 39.80 ± 2.19
acc_norm 49.80 ± 2.24
piqa 0 acc 82.86 ± 0.88
acc_norm 84.87 ± 0.84
winogrande 0 acc 84.45 ± 1.02

Average: 77.77%

TruthfulQA

Task Version Metric Value Stderr
truthfulqa_mc 1 mc1 53.98 ± 1.74
mc2 68.47 ± 1.53

Average: 68.47%

Bigbench

Task Version Metric Value Stderr
bigbench_causal_judgement 0 multiple_choice_grade 59.47 ± 3.57
bigbench_date_understanding 0 multiple_choice_grade 64.50 ± 2.49
bigbench_disambiguation_qa 0 multiple_choice_grade 44.96 ± 3.10
bigbench_geometric_shapes 0 multiple_choice_grade 22.84 ± 2.22
exact_str_match 2.79 ± 0.87
bigbench_logical_deduction_five_objects 0 multiple_choice_grade 30.80 ± 2.07
bigbench_logical_deduction_seven_objects 0 multiple_choice_grade 21.57 ± 1.56
bigbench_logical_deduction_three_objects 0 multiple_choice_grade 56.67 ± 2.87
bigbench_movie_recommendation 0 multiple_choice_grade 51.60 ± 2.24
bigbench_navigate 0 multiple_choice_grade 51.00 ± 1.58
bigbench_reasoning_about_colored_objects 0 multiple_choice_grade 70.35 ± 1.02
bigbench_ruin_names 0 multiple_choice_grade 51.79 ± 2.36
bigbench_salient_translation_error_detection 0 multiple_choice_grade 35.97 ± 1.52
bigbench_snarks 0 multiple_choice_grade 79.01 ± 3.04
bigbench_sports_understanding 0 multiple_choice_grade 75.66 ± 1.37
bigbench_temporal_sequences 0 multiple_choice_grade 47.90 ± 1.58
bigbench_tracking_shuffled_objects_five_objects 0 multiple_choice_grade 23.84 ± 1.21
bigbench_tracking_shuffled_objects_seven_objects 0 multiple_choice_grade 18.00 ± 0.92
bigbench_tracking_shuffled_objects_three_objects 0 multiple_choice_grade 56.67 ± 2.87

Average: 47.92%

Average score: 59.62%

Elapsed time: 01:57:56

Downloads last month
8
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train macadeliccc/SOLAR-10.7b-Instruct-truthy-dpo-exlv2