euclaise
/

Memphis-CoT-3B

Text Generation

supertrainer2000

Model card Files Files and versions Community

euclaise commited on Jan 30

Commit

f58fb74

•

1 Parent(s): c55db1a

Create README.md

Files changed (1) hide show

README.md +62 -0

README.md ADDED Viewed

	@@ -0,0 +1,62 @@

+---
+license: cc-by-sa-3.0
+datasets:
+- euclaise/TinyCoT
+- euclaise/reddit-instruct
+- sablo/oasst2_curated
+library_name: transformers
+tags:
+- supertrainer2000
+---
+Memphis-CoT is a finetune of [StableLM 3b 4e1t](stabilityai/stablelm-3b-4e1t) on [TinyCoT](https://huggingface.co/datasets/euclaise/TinyCoT), along with [reddit-instruct](https://huggingface.co/datasets/euclaise/reddit-instruct) and a [curated](https://huggingface.co/datasets/sablo/oasst2_curated) subset of [oasst2](https://huggingface.co/datasets/OpenAssistant/oasst2).
+**Memphis was trained *only* on human data! No GPT generations here.**
+Finetuning was performed using my [supertrainer2000](https://github.com/euclaise/supertrainer2000) framework, using my Adalite optimizer.
+### Training Procedure
+I finetuned the model using an iterative rationale-bootstrapping procedure inspired by [STaR](https://research.google/pubs/star-self-taught-reasoner-bootstrapping-reasoning-with-reasoning/) and [SPIN](https://arxiv.org/abs/2401.01335)
+First, I finetuned the model on all the datasets using a [MixCE](https://arxiv.org/abs/2305.16958) loss and [NEFTune](https://arxiv.org/abs/2310.05914), for 2 epochs.
+I then performed the following steps 3 times:
+1. Generate responses for each question in TinyCoT using the current model, check each response for correctness, and create a dataset of (correct, incorrect) pairs. Extra values are discarded, such that each correct and incorrect response is unique.
+2. Finetune the model for 1 epoch using a ranking loss over length-normalized log-probabilities of each sequence, similar to [Preference Ranking Optimization](https://arxiv.org/abs/2306.17492), comparing the correct vs incorrect generated response. A standard CE loss over the ground-truth was included to prevent excessive drift.
+This should be more efficient than either STaR or SPIN, as it uses a ranking loss rather than rejection sampling (unlike STaR), and verifies correctness instead of assuming all model responses are incorrect (unlike SPIN).
+### Hyperparameters
+For the initial supervised finetuning step:
+- Adalite optimizer, default hyperparameters of supertrainer2000 unless otherwise specified
+- Lambda (Adalite's analogue to weight decay) of 0.01
+- LR of 1e-5
+- MixCE ratio of 0.75
+- Sequence length of 4096
+- Cosine decay with a 20% warmup
+- Frozen embeddings
+- No training on inputs
+- Accumulated batch size of 128
+- NEFTune with an alpha of 10
+For the generations:
+- Generated using the current git version of `vllm`
+- N=8
+- Temperature of 0.5
+- `top_p` of 0.8
+- Maximum of 512 generated tokens, discarding responses that do not have a valid rationale and answer
+For the rank finetuning:
+- Adalite optimizer, default hyperparameters of supertrainer2000 unless otherwise specified
+- Lambda of 0.01
+- LR of 5e-7
+- Rank loss weight of 5
+- Sequence length of 1024
+- Cosine schedule with 10% warmup
+- Frozen embeddings
+- No training on inputs
+- Accumulated batch size of 128
+- NEFTune with an alpha of 10