license: cc-by-sa-3.0
datasets:
- euclaise/TinyCoT
- euclaise/reddit-instruct
- sablo/oasst2_curated
library_name: transformers
tags:
- supertrainer2000
Memphis-CoT is a finetune of StableLM 3b 4e1t on TinyCoT, along with reddit-instruct and a curated subset of oasst2.
Memphis was trained only on human data! No GPT generations here.
Finetuning was performed using my supertrainer2000 framework, using my Adalite optimizer.
Training Procedure
I finetuned the model using an iterative rationale-bootstrapping procedure inspired by STaR and SPIN
First, I finetuned the model on all the datasets using a MixCE loss and NEFTune, for 2 epochs.
I then performed the following steps 3 times:
- Generate responses for each question in TinyCoT using the current model, check each response for correctness, and create a dataset of (correct, incorrect) pairs. Extra values are discarded, such that each correct and incorrect response is unique.
- Finetune the model for 1 epoch using a ranking loss over length-normalized log-probabilities of each sequence, similar to Preference Ranking Optimization, comparing the correct vs incorrect generated response. A standard CE loss over the ground-truth was included to prevent excessive drift.
This should be more efficient than either STaR or SPIN, as it uses a ranking loss rather than rejection sampling (unlike STaR), and verifies correctness instead of assuming all model responses are incorrect (unlike SPIN).
Hyperparameters
For the initial supervised finetuning step:
- Adalite optimizer, default hyperparameters of supertrainer2000 unless otherwise specified
- Lambda (Adalite's analogue to weight decay) of 0.01
- LR of 1e-5
- MixCE ratio of 0.75
- Sequence length of 4096
- Cosine decay with a 20% warmup
- Frozen embeddings
- No training on inputs
- Accumulated batch size of 128
- NEFTune with an alpha of 10
For the generations:
- Generated using the current git version of
vllm
- N=8
- Temperature of 0.5
top_p
of 0.8- Maximum of 512 generated tokens, discarding responses that do not have a valid rationale and answer
For the rank finetuning:
- Adalite optimizer, default hyperparameters of supertrainer2000 unless otherwise specified
- Lambda of 0.01
- LR of 5e-7
- Rank loss weight of 5
- Sequence length of 1024
- Cosine schedule with 10% warmup
- Frozen embeddings
- No training on inputs
- Accumulated batch size of 128
- NEFTune with an alpha of 10