Jellywibble
/

dalio-pretrain-cleaned-v4

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Model Description

Pre-training on cleaned version of Principles

removing numeric references to footnotes
removing numeric counts, i.e. 1) ... 2) ... 3) ...
correcting gramma, i.e. full stops must be followed by a space
finetuning OPT-30B model on the dataset above
Dataset location: Jellywibble/dalio-principles-cleaned-v3

Metrics

Checkpoint 8 served
Hellaswag Perplexity: 30.65
2.289 eval loss

wandb link: https://wandb.ai/jellywibble/huggingface/runs/2jqc504o?workspace=user-jellywibble

Model Parameters

Trained on 4xA40, effective batchsize = 8

base_model_name facebook/opt-30b
dataset_name Jellywibble/dalio-principles-cleaned-v3
block_size 1024
gradient_accumulation_steps 2
per_device_train_batch_size 1
seed 2
num_train_epochs 1
learning_rate 3e-6

Notes

It is important for the effective batch size to be at least 8
Learning rate higher than 3e-6 will result in massive overfitting, i.e. much worse Hellaswag metrics

Downloads last month: 9

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.