ProgressGym-HistLlama3-8B-C014-instruct

Overview

The ProgressGym Framework

ProgressGym-HistLlama3-8B-C014-instruct is part of the ProgressGym framework for research and experimentation on progress alignment - the emulation of moral progress in AI alignment algorithms, as a measure to prevent risks of societal value lock-in.

To quote the paper ProgressGym: Alignment with a Millennium of Moral Progress:

Frontier AI systems, including large language models (LLMs), hold increasing influence over the epistemology of human users. Such influence can reinforce prevailing societal values, potentially contributing to the lock-in of misguided moral beliefs and, consequently, the perpetuation of problematic moral practices on a broad scale.

We introduce progress alignment as a technical solution to mitigate this imminent risk. Progress alignment algorithms learn to emulate the mechanics of human moral progress, thereby addressing the susceptibility of existing alignment methods to contemporary moral blindspots.

ProgressGym-HistLlama3-8B-C014-instruct

ProgressGym-HistLlama3-8B-C014-instruct is one of the 36 historical language models in the ProgressGym framework.

ProgressGym-HistLlama3-8B-C014-instruct is under continual iteration. Improving upon the current version, new versions of the model are currently being trained to reflect historical moral tendencies in ever more comprehensive ways.

ProgressGym-HistLlama3-8B-C014-instruct is a 14th-century historical language model. Based on Meta-Llama-3-8B, It is continued-pretrained on the 14th-century text data from ProgressGym-HistText, using the following hyperparameters:

learning_rate: 1.5e-05
train_batch_size: 8
eval_batch_size: 16
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 64
total_eval_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: polynomial
lr_scheduler_warmup_steps: 20
num_epochs: 4.0
mixed_precision_training: Native AMP

... with the following training results:

Training Loss	Epoch	Step	Validation Loss
2.5789	0.0152	1	2.6458
2.5672	0.0758	5	2.6280
2.5751	0.1515	10	2.5314
2.418	0.2273	15	2.4634
2.4701	0.3030	20	2.4177
2.3904	0.3788	25	2.3785
2.3539	0.4545	30	2.3378
2.3101	0.5303	35	2.3082
2.3254	0.6061	40	2.2816
2.2762	0.6818	45	2.2614
2.2525	0.7576	50	2.2458
2.2777	0.8333	55	2.2321
2.2054	0.9091	60	2.2206
2.237	0.9848	65	2.2113
1.986	1.0606	70	2.2115
1.9373	1.1364	75	2.2217
1.9228	1.2121	80	2.2132
1.9084	1.2879	85	2.2118
1.9684	1.3636	90	2.2122
1.9126	1.4394	95	2.2094
1.9101	1.5152	100	2.2066
1.8496	1.5909	105	2.2058
1.9154	1.6667	110	2.2057
1.9233	1.7424	115	2.2056
1.9198	1.8182	120	2.2052
1.9229	1.8939	125	2.2048
1.8913	1.9697	130	2.2045
1.8814	2.0455	135	2.2046
1.8813	2.1212	140	2.2051
1.8912	2.1970	145	2.2058
1.9184	2.2727	150	2.2065
1.8662	2.3485	155	2.2071
1.8809	2.4242	160	2.2074
1.8591	2.5	165	2.2077
1.8731	2.5758	170	2.2079
1.8948	2.6515	175	2.2082
1.8876	2.7273	180	2.2082
1.8408	2.8030	185	2.2083
1.8931	2.8788	190	2.2082
1.8569	2.9545	195	2.2080
1.8621	3.0303	200	2.2079
1.8863	3.1061	205	2.2078
1.9021	3.1818	210	2.2079
1.8648	3.2576	215	2.2080
1.8443	3.3333	220	2.2081
1.8978	3.4091	225	2.2080
1.8658	3.4848	230	2.2080
1.8706	3.5606	235	2.2079
1.8855	3.6364	240	2.2078
1.8535	3.7121	245	2.2078
1.9062	3.7879	250	2.2079
1.8628	3.8636	255	2.2078
1.8484	3.9394	260	2.2077

Note that the training data volume for the continued pretraining stage is capped at 3GB. When the corresponding century's corpus exceeds this volume, the training data is randomly sampled to fit the volume.

ProgressGym-HistLlama3-8B-C014-instruct is an instruction-tuned language model. It is tuned on ProgressGym-TimelessQA, using the following hyperparameters. Note, however, that the snapshot at training step 10 is used for the final model, to minimize erosion of the value tendencies learned during continued pretraining; we qualitatively observe that this snapshot still possesses strong instruction-following capabilities.

learning_rate: 1.5e-05
train_batch_size: 8
eval_batch_size: 16
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 64
total_eval_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: polynomial
lr_scheduler_warmup_steps: 20
num_epochs: 4.0
mixed_precision_training: Native AMP

... with the following training results:

Training Loss	Epoch	Step	Validation Loss
0.9832	0.0208	1	0.9730
0.9463	0.1042	5	0.9421
0.8488	0.2083	10	0.8247
0.7833	0.3125	15	0.8149
0.7797	0.4167	20	0.8403
0.8542	0.5208	25	0.8670
0.8895	0.625	30	0.8718
0.8519	0.7292	35	0.8592
0.8224	0.8333	40	0.8491
0.8538	0.9375	45	0.8384
0.6569	1.0417	50	0.8295
0.437	1.1458	55	0.8457
0.4405	1.25	60	0.8668
0.4331	1.3542	65	0.8671
0.448	1.4583	70	0.8597
0.4673	1.5625	75	0.8514
0.4298	1.6667	80	0.8474
0.4252	1.7708	85	0.8458
0.4429	1.875	90	0.8451
0.4484	1.9792	95	0.8450
0.3634	2.0833	100	0.8455
0.3876	2.1875	105	0.8467
0.3717	2.2917	110	0.8481
0.387	2.3958	115	0.8494
0.3561	2.5	120	0.8505
0.4219	2.6042	125	0.8516
0.3798	2.7083	130	0.8527
0.3551	2.8125	135	0.8537
0.3827	2.9167	140	0.8546
0.3938	3.0208	145	0.8556
0.3805	3.125	150	0.8565
0.3813	3.2292	155	0.8574
0.3894	3.3333	160	0.8582
0.3603	3.4375	165	0.8589
0.3515	3.5417	170	0.8597
0.3433	3.6458	175	0.8605
0.3511	3.75	180	0.8614
0.3599	3.8542	185	0.8620
0.3994	3.9583	190	0.8621

Citation

If the datasets, models, or framework of ProgressGym help you in your project, please cite ProgressGym using the bibtex entry below.

@article{progressgym,
  title={ProgressGym: Alignment with a Millennium of Moral Progress},
  author={Tianyi Qiu and Yang Zhang and Xuchuan Huang and Jasmine Xinze Li and Jiaming Ji and Yaodong Yang},
  journal={arXiv preprint arXiv:2406.20087},
  eprint={2406.20087},
  eprinttype = {arXiv},
  year={2024}
}

Ethics Statement

Copyright information of historical text data sources:
- Project Gutenberg, one among our four source of our historical text data, consists only of texts in the public domain.
- For the text that we draw from Internet Archive, we only include those that uploaded by Library of Congress, which are texts freely released online by the U.S. Library of Congress for research and public use.
- The text data from Early English Books Online are, according to their publisher, "freely available to the public" and "available for access, distribution, use, or reuse by anyone".
- The last remaining source of our historical text data, the Pile of Law dataset, is released under a Creative Commons license, which we adhere to in our use.
Reproducibility: To ensure reproducibility, we open-source all the code involved in the production of our main results (including the entire pipeline starting from data collection and model training), as well as the supporting infrastructure (the ProgressGym framework), making replication as easy as running a few simple script files.
Misuse Prevention: In order to prevent potential misuse of progress alignment algorithms, we have carefully formulated progress alignment as strictly value-neutral, without a priori assumptions on the direction of progress. In the event of potential misuse of our dataset, we condemn any misuse attempt to the strongest degree possible, and will work with the research community on whistleblowing for such attempts.
Open-Sourcing: We confirm that our code, data, and models are to be open-sourced under a CC-BY 4.0 license. We will continue to maintain and update our open-source repositories and models.

PKU-Alignment
/

ProgressGym-HistLlama3-8B-C014-instruct-v0.2

ProgressGym-HistLlama3-8B-C014-instruct

Overview

The ProgressGym Framework

ProgressGym-HistLlama3-8B-C014-instruct

Links

Citation

Ethics Statement

Model tree for PKU-Alignment/ProgressGym-HistLlama3-8B-C014-instruct-v0.2

Datasets used to train PKU-Alignment/ProgressGym-HistLlama3-8B-C014-instruct-v0.2

Collection including PKU-Alignment/ProgressGym-HistLlama3-8B-C014-instruct-v0.2

ProgressGym