llama-3-wissenschaft-8B-v2

This model is based on Llama-3-8b, and is governed by META LLAMA 3 COMMUNITY LICENSE AGREEMENT

nbeerbower/llama-3-bophades-v3-8B finetuned on tasksource/ScienceQA_text_only.

Method

Finetuned using an A100 on Google Colab.

Fine-Tune Your Own Llama 2 Model in a Colab Notebook

Configuration

Dataset preparation, system prompt:

def get_correct_answer(example):
    answerIdx = example['answer']
    choices = example['choices']
    return choices[answerIdx]

def get_wrong_answer(example):
    choices = example['choices']
    answerIdx = example['answer']
    for i in range(len(choices)):
        if i != answerIdx:
            return choices[i]

def chatml_format(example):
    # Format system
    systemMessage = "Read the following lecture, then answer the question."
    system = "<|im_start|>system\n" + systemMessage + "<|im_end|>\n"

    # Format instruction
    instruction = ""
    if example.get('lecture'):
        instruction = "Lecture: " + example['lecture'] + "\nQuestion: "
    else:
        instruction = "Question: "
    instruction += example['question']

    # Format prompt
    prompt = "<|im_start|>user\n" + instruction + "<|im_end|>\n<|im_start|>assistant\n"

    # Format chosen answer
    chosen = get_correct_answer(example) + "<|im_end|>\n"

    # Format rejected answer
    rejected = get_wrong_answer(example) + "<|im_end|>\n"

    return {
        "prompt": system + prompt,
        "chosen": chosen,
        "rejected": rejected,
    }

dataset = load_dataset("tasksource/ScienceQA_text_only")['train']

# Save columns
original_columns = dataset.column_names

# Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left"

# Format dataset
dataset = dataset.map(
    chatml_format,
    remove_columns=original_columns
)

LoRA, model, and training settings:

# LoRA configuration
peft_config = LoraConfig(
    r=16,
    lora_alpha=16,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
)

# Model to fine-tune
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    load_in_4bit=True
)
model.config.use_cache = False

# Reference model
ref_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    load_in_4bit=True
)

# Training arguments
training_args = TrainingArguments(
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    gradient_checkpointing=True,
    learning_rate=5e-5,
    lr_scheduler_type="cosine",
    max_steps=1000,
    save_strategy="no",
    logging_steps=1,
    output_dir=new_model,
    optim="paged_adamw_32bit",
    warmup_steps=100,
    bf16=True,
    report_to="wandb",
)

# Create DPO trainer
dpo_trainer = DPOTrainer(
    model,
    ref_model,
    args=training_args,
    train_dataset=dataset,
    tokenizer=tokenizer,
    peft_config=peft_config,
    beta=0.1,
    max_prompt_length=1024,
    max_length=1536,
    force_use_ref_model=True
)
Downloads last month
47
Safetensors
Model size
8.03B params
Tensor type
BF16
Β·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for nbeerbower/llama-3-wissenschaft-8B-v2

Finetuned
(3)
this model
Finetunes
1 model
Merges
1 model
Quantizations
2 models

Dataset used to train nbeerbower/llama-3-wissenschaft-8B-v2

Spaces using nbeerbower/llama-3-wissenschaft-8B-v2 5

Collection including nbeerbower/llama-3-wissenschaft-8B-v2