README.md · euclaise/Memphis-CoT-3B at f58fb7415f4f1d140487573a5cfd402da3716468

metadata

license: cc-by-sa-3.0
datasets:
  - euclaise/TinyCoT
  - euclaise/reddit-instruct
  - sablo/oasst2_curated
library_name: transformers
tags:
  - supertrainer2000

Memphis-CoT is a finetune of StableLM 3b 4e1t on TinyCoT, along with reddit-instruct and a curated subset of oasst2.

Memphis was trained only on human data! No GPT generations here.

Finetuning was performed using my supertrainer2000 framework, using my Adalite optimizer.

Training Procedure

I finetuned the model using an iterative rationale-bootstrapping procedure inspired by STaR and SPIN

First, I finetuned the model on all the datasets using a MixCE loss and NEFTune, for 2 epochs.

I then performed the following steps 3 times:

Generate responses for each question in TinyCoT using the current model, check each response for correctness, and create a dataset of (correct, incorrect) pairs. Extra values are discarded, such that each correct and incorrect response is unique.
Finetune the model for 1 epoch using a ranking loss over length-normalized log-probabilities of each sequence, similar to Preference Ranking Optimization, comparing the correct vs incorrect generated response. A standard CE loss over the ground-truth was included to prevent excessive drift.

This should be more efficient than either STaR or SPIN, as it uses a ranking loss rather than rejection sampling (unlike STaR), and verifies correctness instead of assuming all model responses are incorrect (unlike SPIN).

Hyperparameters

For the initial supervised finetuning step:

Adalite optimizer, default hyperparameters of supertrainer2000 unless otherwise specified
Lambda (Adalite's analogue to weight decay) of 0.01
LR of 1e-5
MixCE ratio of 0.75
Sequence length of 4096
Cosine decay with a 20% warmup
Frozen embeddings
No training on inputs
Accumulated batch size of 128
NEFTune with an alpha of 10

For the generations:

Generated using the current git version of vllm
N=8
Temperature of 0.5
top_p of 0.8
Maximum of 512 generated tokens, discarding responses that do not have a valid rationale and answer

For the rank finetuning:

Adalite optimizer, default hyperparameters of supertrainer2000 unless otherwise specified
Lambda of 0.01
LR of 5e-7
Rank loss weight of 5
Sequence length of 1024
Cosine schedule with 10% warmup
Frozen embeddings
No training on inputs
Accumulated batch size of 128
NEFTune with an alpha of 10