TianyiQ's picture
Upload ./README.md with huggingface_hub
9d548b5 verified
metadata
license: cc-by-4.0
tags:
  - alignment
  - value alignment
  - AI safety
  - safety
  - LLM
  - history
datasets:
  - PKU-Alignment/ProgressGym-HistText
  - PKU-Alignment/ProgressGym-TimelessQA
base_model:
  - PKU-Alignment/ProgressGym-HistLlama3-8B-C015-pretrain
  - meta-llama/Meta-Llama-3-8B

ProgressGym-HistLlama3-8B-C015-instruct

Overview

The ProgressGym Framework

Framework Diagram

ProgressGym-HistLlama3-8B-C015-instruct is part of the ProgressGym framework for research and experimentation on progress alignment - the emulation of moral progress in AI alignment algorithms, as a measure to prevent risks of societal value lock-in.

To quote the paper ProgressGym: Alignment with a Millennium of Moral Progress:

Frontier AI systems, including large language models (LLMs), hold increasing influence over the epistemology of human users. Such influence can reinforce prevailing societal values, potentially contributing to the lock-in of misguided moral beliefs and, consequently, the perpetuation of problematic moral practices on a broad scale.

We introduce progress alignment as a technical solution to mitigate this imminent risk. Progress alignment algorithms learn to emulate the mechanics of human moral progress, thereby addressing the susceptibility of existing alignment methods to contemporary moral blindspots.

ProgressGym-HistLlama3-8B-C015-instruct

ProgressGym-HistLlama3-8B-C015-instruct is one of the 36 historical language models in the ProgressGym framework.

ProgressGym-HistLlama3-8B-C015-instruct is under continual iteration. Improving upon the current version, new versions of the model are currently being trained to reflect historical moral tendencies in ever more comprehensive ways.

ProgressGym-HistLlama3-8B-C015-instruct is a 15th-century historical language model. Based on Meta-Llama-3-8B, It is continued-pretrained on the 15th-century text data from ProgressGym-HistText, using the following hyperparameters:

  • learning_rate: 1.5e-05
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 64
  • total_eval_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: polynomial
  • lr_scheduler_warmup_steps: 20
  • num_epochs: 3.02
  • mixed_precision_training: Native AMP

... with the following training results:

Training Loss Epoch Step Validation Loss
2.6141 0.006494 1 2.6354
2.657 0.032468 5 2.6206
2.6337 0.064935 10 2.5846
2.5268 0.097403 15 2.5516
2.5275 0.129870 20 2.5321
2.5005 0.162338 25 2.5131
2.5339 0.194805 30 2.4961
2.5335 0.227273 35 2.4808
2.4252 0.259740 40 2.4643
2.4445 0.292208 45 2.4518
2.4594 0.324675 50 2.4394
2.4498 0.357143 55 2.4287
2.3821 0.389610 60 2.4184
2.4317 0.422078 65 2.4091
2.3931 0.454545 70 2.4001
2.3695 0.487013 75 2.3934
2.3981 0.519481 80 2.3855
2.3952 0.551948 85 2.3789
2.4137 0.584416 90 2.3721
2.3614 0.616883 95 2.3669
2.3467 0.649351 100 2.3612
2.4012 0.681818 105 2.3569
2.3224 0.714286 110 2.3528
2.3348 0.746753 115 2.3483
2.3573 0.779221 120 2.3448
2.306 0.811688 125 2.3412
2.342 0.844156 130 2.3382
2.3045 0.876623 135 2.3356
2.2959 0.909091 140 2.3330
2.3545 0.941558 145 2.3305
2.3446 0.974026 150 2.3285
2.2502 1.006494 155 2.3268
2.0791 1.038961 160 2.3347
2.1034 1.071429 165 2.3399
2.095 1.103896 170 2.3358
2.0627 1.136364 175 2.3346
2.0408 1.168831 180 2.3357
2.0575 1.201299 185 2.3364
2.0976 1.233766 190 2.3349
2.0668 1.266234 195 2.3336
2.0579 1.298701 200 2.3329
2.0756 1.331169 205 2.3326
2.1174 1.363636 210 2.3325
2.0663 1.396104 215 2.3325
2.0941 1.428571 220 2.3324
2.1074 1.461039 225 2.3324
2.1251 1.493506 230 2.3322
2.0629 1.525974 235 2.3318
2.0872 1.558442 240 2.3312
2.0994 1.590909 245 2.3310
2.0879 1.623377 250 2.3308
2.0623 1.655844 255 2.3305
2.1054 1.688312 260 2.3303
2.0736 1.720779 265 2.3301
2.1146 1.753247 270 2.3300
2.0444 1.785714 275 2.3301
2.0541 1.818182 280 2.3301
2.1333 1.850649 285 2.3300
2.1101 1.883117 290 2.3299
2.0234 1.915584 295 2.3298
2.0671 1.948052 300 2.3298
2.083 1.980519 305 2.3298
2.0417 2.012987 310 2.3299
2.0784 2.045455 315 2.3303
2.058 2.077922 320 2.3308
2.0524 2.110390 325 2.3312
2.0318 2.142857 330 2.3316
2.0914 2.175325 335 2.3318
2.0319 2.207792 340 2.3320
2.0099 2.240260 345 2.3322
2.075 2.272727 350 2.3323
2.0444 2.305195 355 2.3324
2.0428 2.337662 360 2.3325
2.0612 2.370130 365 2.3326
2.1078 2.402597 370 2.3327
2.0643 2.435065 375 2.3327
2.0667 2.467532 380 2.3326
2.0285 2.500000 385 2.3324
2.0571 2.532468 390 2.3322
2.0209 2.564935 395 2.3322
2.0537 2.597403 400 2.3323
2.0138 2.629870 405 2.3324
2.0772 2.662338 410 2.3324
2.039 2.694805 415 2.3323
2.0181 2.727273 420 2.3322
2.0484 2.759740 425 2.3320
2.0224 2.792208 430 2.3320
2.0732 2.824675 435 2.3320
2.0499 2.857143 440 2.3321
2.0498 2.889610 445 2.3321
2.0472 2.922078 450 2.3320
2.1327 2.954545 455 2.3319
2.0642 2.987013 460 2.3319
2.0654 3.019481 465 -

Note that the training data volume for the continued pretraining stage is capped at 3GB. When the corresponding century's corpus exceeds this volume, the training data is randomly sampled to fit the volume.

ProgressGym-HistLlama3-8B-C015-instruct is an instruction-tuned language model. It is tuned on ProgressGym-TimelessQA, using the following hyperparameters. Note, however, that the snapshot at training step 10 is used for the final model, to minimize erosion of the value tendencies learned during continued pretraining; we qualitatively observe that this snapshot still possesses strong instruction-following capabilities.

  • learning_rate: 1.5e-05
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 64
  • total_eval_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: polynomial
  • lr_scheduler_warmup_steps: 20
  • num_epochs: 4.0
  • mixed_precision_training: Native AMP

... with the following training results:

Training Loss Epoch Step Validation Loss
0.8675 0.1042 5 0.8585
0.8415 0.2083 10 0.8063
0.8225 0.3125 15 0.8210
0.806 0.4167 20 0.8412
0.8139 0.5208 25 0.8702
0.8978 0.625 30 0.8631
0.814 0.7292 35 0.8550
0.7989 0.8333 40 0.8473
0.8769 0.9375 45 0.8383
0.7244 1.0417 50 0.8278
0.4644 1.1458 55 0.8387
0.4488 1.25 60 0.8680
0.3973 1.3542 65 0.8718
0.443 1.4583 70 0.8596
0.4346 1.5625 75 0.8514
0.4701 1.6667 80 0.8461
0.4344 1.7708 85 0.8437
0.4274 1.875 90 0.8434
0.4771 1.9792 95 0.8434
0.3876 2.0833 100 0.8439
0.3698 2.1875 105 0.8451
0.407 2.2917 110 0.8465
0.374 2.3958 115 0.8482
0.3945 2.5 120 0.8498
0.3753 2.6042 125 0.8513
0.3721 2.7083 130 0.8528
0.3718 2.8125 135 0.8542
0.3773 2.9167 140 0.8555
0.3723 3.0208 145 0.8565
0.374 3.125 150 0.8576
0.3728 3.2292 155 0.8588
0.3686 3.3333 160 0.8598
0.3617 3.4375 165 0.8607
0.3546 3.5417 170 0.8613
0.3707 3.6458 175 0.8619
0.3739 3.75 180 0.8625
0.3617 3.8542 185 0.8632
0.3591 3.9583 190 0.8637

Links

Citation

If the datasets, models, or framework of ProgressGym help you in your project, please cite ProgressGym using the bibtex entry below.

@article{progressgym,
  title={ProgressGym: Alignment with a Millennium of Moral Progress},
  author={Tianyi Qiu and Yang Zhang and Xuchuan Huang and Jasmine Xinze Li and Jiaming Ji and Yaodong Yang},
  journal={arXiv preprint arXiv:2406.20087},
  eprint={2406.20087},
  eprinttype = {arXiv},
  year={2024}
}

Ethics Statement

  • Copyright information of historical text data sources:
    • Project Gutenberg, one among our four source of our historical text data, consists only of texts in the public domain.
    • For the text that we draw from Internet Archive, we only include those that uploaded by Library of Congress, which are texts freely released online by the U.S. Library of Congress for research and public use.
    • The text data from Early English Books Online are, according to their publisher, "freely available to the public" and "available for access, distribution, use, or reuse by anyone".
    • The last remaining source of our historical text data, the Pile of Law dataset, is released under a Creative Commons license, which we adhere to in our use.
  • Reproducibility: To ensure reproducibility, we open-source all the code involved in the production of our main results (including the entire pipeline starting from data collection and model training), as well as the supporting infrastructure (the ProgressGym framework), making replication as easy as running a few simple script files.
  • Misuse Prevention: In order to prevent potential misuse of progress alignment algorithms, we have carefully formulated progress alignment as strictly value-neutral, without a priori assumptions on the direction of progress. In the event of potential misuse of our dataset, we condemn any misuse attempt to the strongest degree possible, and will work with the research community on whistleblowing for such attempts.
  • Open-Sourcing: We confirm that our code, data, and models are to be open-sourced under a CC-BY 4.0 license. We will continue to maintain and update our open-source repositories and models.