Memphis-CoT-3B / README.md
euclaise's picture
Create README.md
f58fb74 verified
|
raw
history blame
3 kB
metadata
license: cc-by-sa-3.0
datasets:
  - euclaise/TinyCoT
  - euclaise/reddit-instruct
  - sablo/oasst2_curated
library_name: transformers
tags:
  - supertrainer2000

Memphis-CoT is a finetune of StableLM 3b 4e1t on TinyCoT, along with reddit-instruct and a curated subset of oasst2.

Memphis was trained only on human data! No GPT generations here.

Finetuning was performed using my supertrainer2000 framework, using my Adalite optimizer.

Training Procedure

I finetuned the model using an iterative rationale-bootstrapping procedure inspired by STaR and SPIN

First, I finetuned the model on all the datasets using a MixCE loss and NEFTune, for 2 epochs.

I then performed the following steps 3 times:

  1. Generate responses for each question in TinyCoT using the current model, check each response for correctness, and create a dataset of (correct, incorrect) pairs. Extra values are discarded, such that each correct and incorrect response is unique.
  2. Finetune the model for 1 epoch using a ranking loss over length-normalized log-probabilities of each sequence, similar to Preference Ranking Optimization, comparing the correct vs incorrect generated response. A standard CE loss over the ground-truth was included to prevent excessive drift.

This should be more efficient than either STaR or SPIN, as it uses a ranking loss rather than rejection sampling (unlike STaR), and verifies correctness instead of assuming all model responses are incorrect (unlike SPIN).

Hyperparameters

For the initial supervised finetuning step:

  • Adalite optimizer, default hyperparameters of supertrainer2000 unless otherwise specified
  • Lambda (Adalite's analogue to weight decay) of 0.01
  • LR of 1e-5
  • MixCE ratio of 0.75
  • Sequence length of 4096
  • Cosine decay with a 20% warmup
  • Frozen embeddings
  • No training on inputs
  • Accumulated batch size of 128
  • NEFTune with an alpha of 10

For the generations:

  • Generated using the current git version of vllm
  • N=8
  • Temperature of 0.5
  • top_p of 0.8
  • Maximum of 512 generated tokens, discarding responses that do not have a valid rationale and answer

For the rank finetuning:

  • Adalite optimizer, default hyperparameters of supertrainer2000 unless otherwise specified
  • Lambda of 0.01
  • LR of 5e-7
  • Rank loss weight of 5
  • Sequence length of 1024
  • Cosine schedule with 10% warmup
  • Frozen embeddings
  • No training on inputs
  • Accumulated batch size of 128
  • NEFTune with an alpha of 10