diff --git a/model_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/README.md b/model_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/model_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/model_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/adapter_config.json b/model_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/model_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/model_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/adapter_model.safetensors b/model_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..203c2aee657a0c57749f99d7288471cc9cf91618 --- /dev/null +++ b/model_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c70e8a7a316b3926d9f462c970ec41c074914d4a87e36977084fd2f74bf7f7f6 +size 67144544 diff --git a/model_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/training_args.bin b/model_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/model_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/model_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/README.md b/model_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/model_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/model_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/adapter_config.json b/model_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/model_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/model_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/adapter_model.safetensors b/model_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..ec965a973415021f2c652fbdee8ba9d28c64d019 --- /dev/null +++ b/model_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dab4ab030ae70054acec18175d73c141430964d6cd8fd46193235211594d565b +size 67144544 diff --git a/model_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/training_args.bin b/model_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/model_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..b286651c54ad53c4e145f45a0981433e75ff211a --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:402ce6229e8412ec1895ce650d643b5d6961f8124952767f9c6716afdba81f2f +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..68ca289162141fc8dcc033d6e005c57b92d3c968 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5318963444a1d1998978a51c3a65973465a1a7449983c4cfe0797070db8e1dda +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..d0cb160fc6752dc0470bb88b1ba16dca7ed969ca --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fd418aa175a4f9508778329e5c11f54241882ad7316c344103bc3804e613599f +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..c7c7ba5a5d73c30d2e2dfccf92552709b61b1a0f --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d4f7e5b3f15e6248eb69742a14f905c700ecf357f80b4e2f91b8b83b2a38d15e +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..f355a78ef0a68d412f6d220c6e28c6b5611bafe0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/trainer_state.json @@ -0,0 +1,48 @@ +{ + "best_metric": 1.5159306526184082, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10", + "epoch": 0.13333333333333333, + "eval_steps": 10, + "global_step": 10, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 1638607198617600.0, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..4781768a59451c108bde2616690e8c131913f0e0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:24a059a59b169a597ddd32df645ceb2a3c8e989c421a3e14df010d2a14aec149 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..fddcf9886ebaec7cd3fdcdbbd60b4413036e6509 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e67b8cacabe8156dc1a25014d56f2c5c51feecc535b14b3ef5ade9be83b7cc88 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..e6cdf36295b4d559507cf0b068680edea3de3a81 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:46513e9b1de488f3d70a4461303e6b827989f588807354e14d010b7ee4f4679f +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..4f1a24bb7d4e46bd15c0b55412cc8ba9b9556c35 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cd2ccdaca083e589c09bcd97757fde390a191ed5c643ace13a70b750fd4a4e4b +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..62f9ae5d1e67bc8010e62448f5cb48904410fbb7 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/trainer_state.json @@ -0,0 +1,183 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 1.3333333333333333, + "eval_steps": 10, + "global_step": 100, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 1.6386071986176e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..ea4469663163841a581d7d51c3150a1b1025766f --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f25042d13827d45f341bcda02149c8f6ca7a4147e402c1b6d0c9876df95fe876 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..5fe0159ff69a1b8b2695f674dff0118d668e2f3c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dd8eaa8be1c8410fbc70322da063c0bc7a61df217c8250ebb31c2127cecf76c2 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..e8b03e39b0cf81b4b723b9421b9fca8f87c7b414 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:319884e2d6c1fad0795ced8add37e8073910c77073120da512a5e6a1f6208d62 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..24dce4e18218617e13af9f93046f397a711717c2 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fe938637817d41932e7175fe8d9bcdaa1f1383328b73e4b56a4e373476a295ba +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..3bb1cc97b1357c2256d95871eff2f3d969e293a2 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/trainer_state.json @@ -0,0 +1,198 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 1.4666666666666668, + "eval_steps": 10, + "global_step": 110, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 1.80246791847936e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..01b26af61018ad1363c0ded790bf05adb4728bf9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e723ff03145115b1b56dc4f6e5a779b2589916dcc98bc3614344636c8281f7c7 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..c7d802946dcbb7e552f3202403b9baba90c571b1 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ded8e175409363734ec70511eb219a4fbf0ad8f4fae983cf551358bdc31e78a6 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..71b7a5227226dcaeadffec096acbc7df0f632989 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3500ac793bd5f15c49da717801f854f9815260499ab4bc16b8f3a1ca9c82dfdf +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..9eb62bb8ba22966a1e254979e1d2479886d174dd --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:145815d6a6480fb85323e9a0f9a98f3e8faa57003487fcac0be85abbf27b4575 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..44225f2be50e20b8fc365ba0fe6759d24e0fcf9c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/trainer_state.json @@ -0,0 +1,213 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 1.6, + "eval_steps": 10, + "global_step": 120, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 1.96632863834112e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..92270bf4226657d2f985245e5d476e8f116384e2 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:135e12a136cf3c0479a813f54b50e283c3e25778c39058c7345b976607616a7d +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..cd68451b0b1bba5b827116c3dfd51bd556164de7 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bf5e700c6ab423553061e8b21a604995373469528adb8d4fcad0085dfc2e104d +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..b60cc4cb8217ae694c7a8efef0eb0b676d897e83 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:602f503f7cd2e84c0b6719714b66d34e98b340f44b02ba8ffc44df096e786100 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..0dae8e46aca4beacf0c154c37d71abe175363a25 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:abdc7730bfbf0869132cbbd456c580122a20a540399e30640d4e51daf6f379d3 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..dbbedf7005976c5086c76ac895916226e1ee9d78 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/trainer_state.json @@ -0,0 +1,228 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 1.7333333333333334, + "eval_steps": 10, + "global_step": 130, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 2.13018935820288e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..37fd34f3e0612cd1865f2f0e9a9810568ed84cf3 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:344860e9884a25de169d2335d934fcf549e9cdd8cb256b363dad13d1e43c2ce6 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..8d4b8b80f7c8e9d71ee72c71449ffb9dce1668a2 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cf7d34fe3e9999f0fa03c246a58e40ae18214bc1db140b84726b378593cc94e6 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..d05f19f3c7e1e4b728f62f56852d18785b6ab4d0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:03c218af617af689aa7eff2d02ae91fb859e96fcb9571b641c5e95247f137dda +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..b77086a6cbb29f3cd0e1ac947f6c71c390b2dff3 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:21a6935970b037ba9fc4b9dc75dbda421fb162f0fa5b7d5502a5e9660c005897 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..f872addbc7a105d906828431fdd2c5ee806975ec --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/trainer_state.json @@ -0,0 +1,243 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 1.8666666666666667, + "eval_steps": 10, + "global_step": 140, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 2.29405007806464e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..2078f71a15157a8918c69a35998a65500bf5602f --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:73d7d882027bc1b6345e0b8975b10f75408a586073c064542f9a11f314fce007 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..2a1aaea76277a56fbf7977056c204b6451703cb9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a2afcfda7a8d0acf7bf776b67664afb8788f482ecc5fa24703ad94022bffedec +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..61dde1ed8b180510bbda84f0c71356862600ad55 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bdf2188bfe5b1127367f0a0d0628c845d9f54239950b10ed26be9372dba68d0b +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..e9d3263bcfa5d62a56c74c931026d6e1762a1781 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0d75316f47d5ef08dad7230d3c189fb5ad736372bf2da793895c59a4ccba811f +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..8fb27d895c971b99f7e9991146a4c5974e170611 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/trainer_state.json @@ -0,0 +1,258 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 2.0, + "eval_steps": 10, + "global_step": 150, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 2.4579107979264e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..4197f67a3fde32662f38109a7a58fd80af5e44de --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d390b389ae8c2c94b85a66ce29cdc28555977668dfbd67f1f9902e163f27efa7 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..7a0ad0ef8eafe1b1d6c7307f4f8201504f9973a7 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3e651b9aef8d49e60b77103eda6349c3b277dfacbda16e89d341094c784cbd56 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..564fc6da8e7c6b2c0f5b62f1f2e55b96ec29c066 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0f1a4ff62819275ae908067e10e49db3630270d7e753db72e5d286184508926f +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..333a8435179bb1a27e74cf71169524425347df64 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3c60f731d4cb1d489de80d48b0d2bf2049ddfec30c083dac3c65e6fc26b9708e +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..3befaf90d174ea9521b78ec85be601e790e285a0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/trainer_state.json @@ -0,0 +1,273 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 2.1333333333333333, + "eval_steps": 10, + "global_step": 160, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 2.62177151778816e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..66b1a4818a62558f3fb8ff311fd7fde1c9a7f767 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f1423df535c0b0f2937af1bdd7bfd2bcaea8b53072ab00c22c132fcb5a4ee6e +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..7ff23098240c029379fccfcd8c73de76de32abe6 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:028cdb4395be39ab98a720700fbe4b82a66be9f74b009d00bc6cc10bb5b2e38d +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..c13cd397e2cbe97d2fb9e944d382c58418c6b136 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:964f6178720317ac51eb375c889b2d86c7184aa024caf52b59339853ffae03ca +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..98392616735ef4e842735f8fdb0443dd62c47cc3 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3f8316c64c3f1dcba9f5f78f5461a5450278d6310afba0a2471aa470b51e14fa +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..3df64f3fb1300b0f6250117b5a3c6f6da4e2f726 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/trainer_state.json @@ -0,0 +1,288 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 2.2666666666666666, + "eval_steps": 10, + "global_step": 170, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 2.78563223764992e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..0b1beddde4e459784f3af0e860cb2355fefe1834 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ae38f9becef3d70d389a412bd817350d7a6ea5c832f5bccdf87b6cd053c197bb +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..4bd8593c606db24ea69ea7bf60f792aadfc64271 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:df7af31a821667f43fc0afd7178af82ccbef3464f09dcf14003eb6d1007c5d21 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..fdca3aeb31ce5b4aeb2c0f2ba53e3e43b6334331 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1b79baa0842c2916b082cba36f9f2b958210e6d7c1813742841fb908cae57fbd +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..4c07d6d39c8000e4887811925b35913c0d0fb9e7 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c5c510c48cbd7d4a31b049b9ce577d9a61337bf5b3120da8df24159e22a5b61b +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..c99190ddbbf131bdec909f56f42471f397decb21 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/trainer_state.json @@ -0,0 +1,303 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 2.4, + "eval_steps": 10, + "global_step": 180, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 2.94949295751168e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..88e7eb84d0161fc6d363b5bfed99a740cd8d96c4 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e5435e040a0dbb1c744d7b614426ab723f152f1f1572427058a492f56834426b +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..d0d189dabf8890785cdf4f78913a6bfbaab83f35 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4359632f66950f4bc7de087d1ee3a8a5744448fac22d01d6c1000a388d7f725c +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..ae44ad6727cf9b3af903ea84902fa6c7f13a5a95 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7d6f4346bdc8a12fcc48535a6002ac46345e4ce1e14bb1f7e9dc3b0ea920641c +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..c8d96687f829fbdebf86c73104630c11643191e8 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:56d80eccd9a2998f395870ad7a48e8df26a0ef5fdd75c8bb18466e506b523f6a +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..149bd38cc4641236dbbfce0f491104d5e721ce5b --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/trainer_state.json @@ -0,0 +1,318 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 2.533333333333333, + "eval_steps": 10, + "global_step": 190, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 3.11335367737344e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..ce6ec35b955b40ac02cbba2a75e33aad91a5b89e --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:11f90691aafeb561724f0252adaa4cef60be0baa0181beb842bc3eb051896fa8 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..83403a616ae6af1d7246cddd9c276dd1d43b969a --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:263539708448c83935fea005c14b69d5e259580f9c7d9416ba71842fc82c440e +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..fe515b4492af517bd45c5a5c7abbba2b94c5ae37 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1a5087ba42b4dd9dc68875c89890b692068c71de7009ff67cb7d8492bce11049 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..61e40aef0a507fb8add486ba2535aadaa164b9a7 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:72a91e63074e9f0fdfc6b1e7414643f389732ccfdfe97b6b3f4c5b0d7a7556a4 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..f788c7dcd4515fc2a181bbfc2b598ce76335ee5f --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/trainer_state.json @@ -0,0 +1,63 @@ +{ + "best_metric": 1.5063730478286743, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20", + "epoch": 0.26666666666666666, + "eval_steps": 10, + "global_step": 20, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 3277214397235200.0, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..f511cb023551d4f09c6753d17f07380d0b6f17d5 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:286b8eaf828b1037216751d3be566c7626392cec5eaa58ab3740b0d9e7037581 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..84dcd8084ee26dadfeae32e5c8f76316c4c53f93 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ce601eb2eb7fec776f14ea2a370319597b963378627e93131db6bb5ac256c3b8 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..da263858f32b7536e68a33626ef41e3ef7a44689 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0dbbe288070e588c7effbe11249d330a3ad16131211e6b5dff1d03a8ebc7517f +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..0fc1f2bea0ca1c9908bf307e3525efa76fd70425 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e5b7dec72c2b7f015512ea839980ec16d0582c7e6d0689dad8794261e73838b6 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..bf5c7e96c921ec08568dbac46fc78e279db8bebb --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/trainer_state.json @@ -0,0 +1,333 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 2.6666666666666665, + "eval_steps": 10, + "global_step": 200, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 3.2772143972352e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..0bf1dfaa23eb851d10f77a0c8a4ea2bfcac48965 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:97fad26ab375c9d76d7516fc32462a8161f91a762fd8a5929117ba0123900882 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..3709b685104a9d1f287053c0e7bd6b743347552e --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:58985e5a7616cbed05dd063b89da7035d48cfc98eb5e40a679507aa6a28e00f6 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..605214081e6b3060d6c3e526fc86e8b8fff3c71b --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cd4e0019fadc179e2ea531ff33d86db759cb80e64a8826bb6bfa90c2483bfc04 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..3b44a2b0d3df617f15242e2d4ea4d5553b544573 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6c59d7cb173602f981a42f5fe61d72e03c87c9f97f456afe9fd66cd09957f177 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..ceaab679a87295eb92a3eeec697bb87040346d34 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/trainer_state.json @@ -0,0 +1,348 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 2.8, + "eval_steps": 10, + "global_step": 210, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 3.44107511709696e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..4fa01bcb113369355327c3f7e80f016b7b48d676 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5bb2318abc1357d981047e4a619eda386a57784375ad1f5aab808f98320ed61a +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..16ec4ce510357791b06685e72cd84fcd6bc0df63 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:63c628eb6c7df6ef71ced169288dcb77ae24f602521da1ce82820e66c29be0a0 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..823c878e3ad7d7799e1959fba97c90aaf79af4f9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f5e4256f7b7ace2dd6194570c191ab9026456dc0db24025edac4a5bd9e379dab +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..63906caa8cd7e3fc0686b7d0276e496942ef0036 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:daeceb22ea0c54e6923c8a042a9cfc5a5bc826f201c52f29454b62c289d49dc6 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..e7f1e933e530181948a234619363c5c230b3a5ae --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/trainer_state.json @@ -0,0 +1,363 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 2.9333333333333336, + "eval_steps": 10, + "global_step": 220, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 3.60493583695872e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..33efc20a738c35ad2d10dd1c897a48929b9918eb --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cde197fc1e40a1d04c62289628a33f3f0806a256568ee49d3919634a5152b41d +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..9f9062e2a2f8293c5b97260ec0b6ce88ca1db69b --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b71640db7d9f42285cc8f5ebbf81addd07b48866016e059a437f95e2ddde86e1 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..ae85ad205796b2c3955218eb7b4b348ca35978c7 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3e2b38199e26ee1965ef79aea019c0217039e7dab109a4b6e29c57f1bea63d6d +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..3e92e5593d8d7139e837b2a75209a41c074c2e8c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d7f755a0bef74517fb45fc39d7689eaec499187cc5cd60002751078b0276b353 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..42bf6ebf3f973eef06429f1549299f154aaf1c82 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/trainer_state.json @@ -0,0 +1,378 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 3.066666666666667, + "eval_steps": 10, + "global_step": 230, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 3.76879655682048e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..2c16126c00ca2579762a7165d2f831bc0551291e --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:06482d675684e7f12df6964c65a08f4f4de054076ad3df2486bcf642a7d7799c +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..ceaa4b49bce40d8fd90d4628b39bbf2228a0d07a --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dbf9b7e37667fee64405039732d5939c3e8559ab435861276d396b48d851de68 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..846c31e0418b3b3196b4e9c5d730a866c947d1d6 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:33d7857a6e3603508425c326c1a1dee439799d2c72bbfc8afcabbb8578757780 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..31e05b86275fed970cdeadc24115c84e19feae09 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f73703efe567bf60e5ab219b736abd5d1183aaab558b64454b92f8bc5cf1b3fd +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..ea9c0708ef6a2fdce67481fe3660ac2f9c07a5a8 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/trainer_state.json @@ -0,0 +1,393 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 3.2, + "eval_steps": 10, + "global_step": 240, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 3.93265727668224e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..a4f33973bd0ad21b07e39d1bf1228b4b3ced67d4 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1ea179b26e0b03b5ff48e91b41626e0babf82828746a49802df198601bb6a751 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..b777ed01308e480361da6884ed355f7d1224b030 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4b9f158a57d289559d75238626464719ef6a4a8c27e9a524eed467b532f47c74 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..90df82c0a610ae490c2592c79d46fe23cde8d351 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1a5b7a10b9f8de84d4eac8f0b5437669695e0a3ed004e055b39340577de17c55 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..174b5438f88f4c3c799b43c4f559ca991fb938b4 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7ef310c01f40cba8e9c44af8332d1cb681a7026399804fa2296ed59c6594e708 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..f0062cbbbb1edcf772cebbdae84e7297bbcb3218 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/trainer_state.json @@ -0,0 +1,408 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 3.3333333333333335, + "eval_steps": 10, + "global_step": 250, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 4.096517996544e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..b18d4588ef4f24a2a51844ed349d30e21fbb2a0d --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:126910580480e9326ceb47a721345b565f812f87d406220e9e3a32ce383c8d42 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..cb7cb18f380922efabb2ee52bd2d3985746b6b9c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c23ccff3f1fb147e0bd4478c6683ca742609006342cb7a9747299177bbe63e43 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..293d181974003fee2540af0648cfb4e42786ca56 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:78bbc69e88d5e1fb15138660b4de76d03b9476fa1ab2d16370f894a65eab3da3 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..9431a5b0a8e3a7cfc7a6acff3f3ba51f0ea91b16 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e388642b0db2b68dfc847810d17830763a6c1ccd5a0a2c34607435281dfa7f25 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..c1e234113ad077f049832b2afde0c68f21435c85 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/trainer_state.json @@ -0,0 +1,423 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 3.466666666666667, + "eval_steps": 10, + "global_step": 260, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 4.26037871640576e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..ea8f11f65f25cfc19bf17e155e56d9d554d897b8 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9a606d4df7d781f460a6ee2d8f8158ffa3432a66ad30c87aa2ce3686f0776584 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..cbf19042d22ab84dbcaa1df99f47c07bfb287357 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3789b7900befc2f0cc5be0246013273afdf24554aeb0008223d001fb7a4ab6f2 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..ba62c782c818c1b90b0344e262a00bb91255dc87 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2af2c0de08ddef877a4af0e5f2dfe4570d2f029659f125fbfe3bbcce3a8b09e6 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..ed00d4e9803635011eee9bbdae275cac04953c1b --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9b86f25c4fadc98da61c18896b4c25ab399b3a23b766274b50979d4340358b17 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..2cfb306f562ba2fcdc8471ce16577a6a7cd25c18 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/trainer_state.json @@ -0,0 +1,438 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 3.6, + "eval_steps": 10, + "global_step": 270, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 4.42423943626752e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..114c853b984ad29c8ab17633f2736c1d31625399 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3c325a91cc1ade41a7c65b8c51791c2b4339d9f04314282e75984adf42e6d5c7 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..e489d37d3761d298618e0c40246d3090b647ee3b --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8770d07af8ecff005acc2319daaf9e3f80b05baf775f68265d3f05cc2d04de64 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..1702f62666b39cac633a34cf312f24e311e13df2 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9ba79aaff190fd3ef9f70dd7c0a234665c2bd6c6bb243b5896c5bd6a16356627 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..5e2ffe406e2d87ea70e25bfbdad4187edda05acb --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3d68cb0fb8d225e623592feefec72ecd0b7071657fb56415f262582b52279a56 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..63d07291cd1e2da0490d4a06854df65187cf1d12 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/trainer_state.json @@ -0,0 +1,453 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 3.7333333333333334, + "eval_steps": 10, + "global_step": 280, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 4.58810015612928e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..a688488f28fb0aee6d83fe8f84e1ca2ad4935fdf --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:26fb176d5b5bca6c93298c2d140f92ea5ebd5cc16902d04d7d43eae6549589d8 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..c1fce8d342171a751fc86998c6e564d04310ca06 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:61c57fdb09e0ea18f9c25d07bfc4ea0760f1c1f4175468352daca2cd48d15219 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..fecfedbf1488a31afeaf7c01dc4f9760cfff1b16 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:47c6345b8afbd1f7a687e942ce33ce022660a29cb46a23e4c9eda9e498053741 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..1da17016e7f80351316298af3ab35d6cc666d60f --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:518f59f6861d3d54674180d781456c4d55d82eb1d5543c592846efd5b6bea3ea +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..53c72d516ce50006e418ce2ab90b751879b2e530 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/trainer_state.json @@ -0,0 +1,468 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 3.8666666666666667, + "eval_steps": 10, + "global_step": 290, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 4.75196087599104e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..d0f36655500e6c0fcb574bc182e26e822cdcf457 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f8ca40d07b8b50f43b89708d4bf4d5d2f70006b37b54a53a2b15b3db3bac7432 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..3b2f39888f622545935813c53da3fda207ef3cd2 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cfcac3c68f19d704befa779226e209d2f50162fe6b326536797454f265b44fc8 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..76ee62462f7b8b87edaf24539d12d81995c70164 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3a5478e4e53ebdf948038ed344f6e976416991ec94630cb094a18d5adf7aae7a +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..8e3204abc81bf616d4220ccab7f0f13520ce949e --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:19debbf018dbf40b240b0a2ef65d5d10de2fa92e61c8838b0319c8c96ad962cd +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..9399c54864fdfd8688fbb95f352086da89bd6356 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/trainer_state.json @@ -0,0 +1,78 @@ +{ + "best_metric": 1.4993542432785034, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30", + "epoch": 0.4, + "eval_steps": 10, + "global_step": 30, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 4915821595852800.0, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..cafb74cb5b411bf66f00aecc5996f63f5e08def5 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:258c016112fabe2c427a2d7d4e3c03a302f22ba4ef8e81e400d6445845a5fb95 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..475c04451eeefeac98c5ff12464e3f4e080685b9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c469e34c31ce37267e076429a0e8f2fd5300c34894ddc62c5d5fcb38b9373f34 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..4d8ba268ef07796e970a23442889935701a1dda5 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2574c6149307e492ef05d2031918a546356cc654f4671c817f05ae6d0764de7f +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..8640fdd49a163110aee721e1510c7d552b4242d7 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:30336c219d20749546325363bfa0b5ee5e9d4b073a303024ff3ad347834b8c13 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..5156aa3d372f8310a8f9c8e20fbdbf1c1ceb50dd --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/trainer_state.json @@ -0,0 +1,483 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 4.0, + "eval_steps": 10, + "global_step": 300, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 4.9158215958528e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..333f48bb8e798b74883cf6f8b319cbf83b6b35f2 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:957dfe947a2e707d8abd1956ae0a719cc4574d89fcd0ea1e020fdb01e86c85da +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..9b902589e7a667d751eda34645308412bcd77104 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5fcebc8c97b8eefdfbc106d163f18f218abc3ead0ad493caab83ffacca2fb0fc +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..a5b4503b006d8dec33c7a086d3d007eef4282144 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a82d768c5f5c231c8b50481a409281b8639e231a185281a7476164488eb6c27f +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..58f0265f6abd6b6684c5edee08f03cf244492dc5 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f67fc10f846f52b9c0359f08a436d3ebec080f189f60c98def04956b2dc83cd7 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..25d45cdd24a72f4dcdf1c5d9d21a97f53b5c9933 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/trainer_state.json @@ -0,0 +1,498 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 4.133333333333334, + "eval_steps": 10, + "global_step": 310, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 5.07968231571456e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..86400aa00d1ceffe31dd510be9f8c3a087ec108f --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:379a34c8afd19145915db369e5faaf7f0e344c8e319d242724f492030a0bc240 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..c127d94242fb2bc461577f70189b341848acb4de --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cc4877608c1b11b88af5435fa671c1e48108e718a38e9eee59bfbdbd014a67a7 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..f5fbaf3739704eea759ab29b4b9eba0fecf79ee6 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4f581763059f9808c6971d543bee5e034fff1a9ec174cb7aa232dd9f17099da0 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..0f02c233f432413573681087f8ebce358efeb676 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ae2093149925b534f5c60211635bf0097e5b3bf50dc856b0e3f5b17717e52497 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..640f5f8df3f605b6d16121463b16777910be24fa --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/trainer_state.json @@ -0,0 +1,513 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 4.266666666666667, + "eval_steps": 10, + "global_step": 320, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 5.24354303557632e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..1e158ea5e1437c37cc9897909e4e6c5c3e3dd133 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d13074c02da6db4e7ce755374ab7d6dfb1aed73a3a4fdcca4f8ff6f0339c199b +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..c44506a243f8298c1c1b06352236b1ff9186adfa --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1b9b5b83663c35ed589437faeb6562ffbbdf507221b1948ef51ac8f8f43a1849 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..759bff60bd0897427bf9d4410df520d35fd20081 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:389caf1bb32aae3a751e11d63ffe273f089df59490c4ac6e5883d944b329df0b +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..2df061fe83adad240544d1899eb2e5e2fb23a555 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:22b896fe763bef96dfe0d570de4fea5d935b3bf80de3a9b1b2918efca334b093 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..f42a6ddb7dad4b3d2ee8186c1efbc523c3417d59 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/trainer_state.json @@ -0,0 +1,528 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 4.4, + "eval_steps": 10, + "global_step": 330, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 5.40740375543808e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..4ed9867e7b38f64b387bbd4544116cb1527ae627 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f32b6911f5e53adf1db6752d8d4de934ba9e3f0cf470256ae194a91af7cd18c1 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..918e8c39b9c4f592fed57be3a890bec8fa5cf7d1 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4663abbf1565150786381b2ebcc132c67780f47d106fbbb31cfc6364c7848bde +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..4d7fc830aabf2c4827b0609ed6e355d0fa80523b --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b904f845552beb994fcd34362e728f918c7473ac27288d463195b51c3ed73bff +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..28521d181a67af05811165bf7cec3a0fcb49ae9d --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d03bb25f48f188323d4c5dda872d760e309dedbed641397ec2ec756835c29ac5 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..e791f5aa8dc71a4a27ff4eff342e7754a52b098a --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/trainer_state.json @@ -0,0 +1,543 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 4.533333333333333, + "eval_steps": 10, + "global_step": 340, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 5.57126447529984e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..013d43bf60ed5f702b773de5271e14327b904f79 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4d7d1ff0fca4583604335d960075b810a200786f86da16ee84714c18d6c158cf +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..1672e9c1a7e70fa52184237bb78101802ccd6d9a --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c0846bfc1b7d649f5eceda4cce69d0b30b4031f392808f116ee9a8080ec1b04a +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..fc3bb37d365dcd8ae3528d8e7242f7d2eae755b3 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:39cd0c0a4049d541d90e7c6154cb21167a341830884ad3558195617942678446 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..3fc1a8fc07398191149b701be855b2b30b04d498 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8b8cfefd46d2412b7b17da7d799f9e9021312d0b294976f3e87f7063aa01557b +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..5e6b4b45b34a995a301e60e8a97d9de31f5e7c9d --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/trainer_state.json @@ -0,0 +1,558 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 4.666666666666667, + "eval_steps": 10, + "global_step": 350, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 5.7351251951616e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..2494e7826165664fdbf7de500f976d41681093f9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:aac6167e94a06ca84bb64a52ba33d28b4d137bb51b31900bd8a485a2aef22e00 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..e4413b7023515add68c213ab03c0e8cece3fc43f --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4a85fc0c9b738715c2f48de6253d976b5465a9881e5809ca38c7b0dc3bd5aa62 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..dff7e422d3f8fc71ea77fa33b28878ffbe8abd43 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9d73d43b628bfbe3f56e29099c04e9e9584349f935d8148aa8c34849bf03ef49 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..fc7191e0d24d86be98ffef99b67fe56b52160821 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6cb47a43082c3958508d73a1bd58f111764a18725005ed6a37a8d99585cef386 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..409b12d04f9628139ef64441bcf954ec65a79d13 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/trainer_state.json @@ -0,0 +1,573 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 4.8, + "eval_steps": 10, + "global_step": 360, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 5.89898591502336e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..13ba13059e0954cf9246cc28f74812b599a93298 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fcda3106ffdccd1e7bdbfb047029ee41e147c12f27ae2d22a6a223300cba0e7a +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..1b10cdf388fc22f077c38e7906148603472762a8 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:807f5f290bc195bc29b4940787471ccdc295c17efc7850e4a5410912883ba8b9 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..792417d4c800bc4c8f7eb21d5421678309a6165b --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d7c0e313f3d6f9e1adc7603b9ffa6f0ab3438f71ce0c71bd9a788485d02b981c +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..8bd8e28f2af2a751646ea36889854c5eda0b2292 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1511804f46c0ca65fb38b3cc2eecf2ff9872408b4f80615834923e731745685a +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..d57ff532162af6f579ab2378e0b7809762b135fa --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/trainer_state.json @@ -0,0 +1,588 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 4.933333333333334, + "eval_steps": 10, + "global_step": 370, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.1980743408203125, + "learning_rate": 3.614814814814815e-05, + "loss": 0.9424, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.7190055847167969, + "eval_runtime": 43.8492, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 6.06284663488512e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..c1f8f499ba8828d76015e04d993603ed9c08da8a --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9e7dc225b1017f4c6be3b78b4b550c82b35aac19a6ebc22fc65ced87051ad846 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..dbd8307c1e7acad7e547dd300523ca9aac5d862e --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:079581da3a506c1aa3bd2bc5bfcb88b98b70809fdbddd2c0e0d2aedd272336d1 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..f3b952e81c9ed8c37528c0b9d4c13811ac0b62d3 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d5ce5744fa32738c65fe7785ec589c49d96370233c9386567c3f06dceedb5f2c +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..d9c94a2d554cb9176e1f6452c1c8064e701f6c9c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:69cb8b9fe313cb48c89565a287ca91c45004877815ee7660be6b701d2464119a +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..3f6d9eea1e021191b90064514491c0c145bffcca --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/trainer_state.json @@ -0,0 +1,603 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 5.066666666666666, + "eval_steps": 10, + "global_step": 380, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.1980743408203125, + "learning_rate": 3.614814814814815e-05, + "loss": 0.9424, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.7190055847167969, + "eval_runtime": 43.8492, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 1.9299914836883545, + "learning_rate": 3.4962962962962965e-05, + "loss": 0.963, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 1.7638393640518188, + "eval_runtime": 43.8416, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 380 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 6.22670735474688e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..e058f90bd49c7cc9b755c39c2822645634c548a8 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8b391514c2220c169b5f9972beff5735dd90fbe490f091959f70bc43b4413594 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..f7290f1ab6955d1bfacdf8116c40f8e9746160ca --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9a986a88e25bb188b7a15aa8509cce1c862f1df148eca298e1b6788ccb4c5ee4 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..b458d8885e612e71d79c420d6ca3a40dcdcf7fd8 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f47a6a8940dea009f3b7ce239248233dd458275df17acc4fa8ff99eb346e8979 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..bf1961280088992857ddb8fe8d4584423c44edf6 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6bf472a6dc646995e9eb3a1b728ed47b4f764790f096bc535722b440312b4b49 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..60c06650690ffc5cdfeb4a80b53e51de022b1071 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/trainer_state.json @@ -0,0 +1,618 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 5.2, + "eval_steps": 10, + "global_step": 390, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.1980743408203125, + "learning_rate": 3.614814814814815e-05, + "loss": 0.9424, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.7190055847167969, + "eval_runtime": 43.8492, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 1.9299914836883545, + "learning_rate": 3.4962962962962965e-05, + "loss": 0.963, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 1.7638393640518188, + "eval_runtime": 43.8416, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.467031478881836, + "learning_rate": 3.377777777777778e-05, + "loss": 0.8679, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 1.8204401731491089, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 390 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 6.39056807460864e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..70abe08bab328f1a803f577072d461f9230476c6 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b007e22cfd973aa5d84d4cdc6842fcaca8b9c4128442719229cd151bced4778d +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..9fa2bb9d964762dda499e171204947678cb35dda --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:454db8ec0a72c1dc7eb69bd72467e5359617e1dcb7c09ff5305d1deb5b89ad79 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..cc0cb9030af17e56f3ab00fc0ad6850b4636069d --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b5fde33a4ff115b0a519c0ef179183e0540c837c91cce3dba97312fa8e725570 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..1159228ea69439db76026731513cf5c71e57f3eb --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5f953d62fd365ebab5cb8aad6e7c0cdb075e95f55a4cb36b4f4e0198710f2320 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..0eba2a6659649a3110b6b75a349912ff509ba77e --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/trainer_state.json @@ -0,0 +1,93 @@ +{ + "best_metric": 1.4972199201583862, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40", + "epoch": 0.5333333333333333, + "eval_steps": 10, + "global_step": 40, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 6554428794470400.0, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..0f42a701af2e023855c4ed46b6e41eca2778966b --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d8a9ef23b9d559f36516922f31569bf814828daaad4a1b14128e4bfcd843c654 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..a83126ade213bdfc8d0a736faef13428fb1e1e54 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:93b9d944de977217054d2e9b9f6391392247118308ac39d50cd3f6c5ab6f1031 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..d06e3c475517e0d14c13a6ccad84a3f20110949a --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:96f529f9856ab8a411ac6b8078e33cfc18c0159c4947cd8cac8e1238fc1754c7 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..f91719d8a1b8836b7155587d155c2b2cfc9c7e48 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bbe59d4638e3afc1c337d3e4814ea99d33c22eec7bbc39984af69898855ffb2b +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..9fbe8f941a7455ec6812e0541348af59cbd863a2 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/trainer_state.json @@ -0,0 +1,633 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 5.333333333333333, + "eval_steps": 10, + "global_step": 400, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.1980743408203125, + "learning_rate": 3.614814814814815e-05, + "loss": 0.9424, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.7190055847167969, + "eval_runtime": 43.8492, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 1.9299914836883545, + "learning_rate": 3.4962962962962965e-05, + "loss": 0.963, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 1.7638393640518188, + "eval_runtime": 43.8416, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.467031478881836, + "learning_rate": 3.377777777777778e-05, + "loss": 0.8679, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 1.8204401731491089, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.7744836807250977, + "learning_rate": 3.259259259259259e-05, + "loss": 0.9249, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 1.7969322204589844, + "eval_runtime": 43.8499, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 400 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 6.5544287944704e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..8f829a70b8d999e54a46290ffa4c7543e73d6c60 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:efe78e96a1c604db54c36883034fb8244c616a0f942b68dea7ce611bc990ac90 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..100074f0e63cacc0078b5b19b1dfcbc489613b3b --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b77261a85948ba2e5cddbf69fa3258b5a2aceed961641588fc816bf9d09382c0 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..090a1de878697aa3e6255ed23ff26ce6e561a9fa --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2cab01f3c0a9d66cf16eec91d8aebbfd533628e45bdb849b4c3e4ad317f15270 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..390146116c48e62f4426eeb3a1cf7a2ccb90f69b --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:72422499e547842d9c164e7afacfea53fe3941a7a106527c3755c473fa91c799 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..8ebf69e09abbf2bc25d46f0d4a37552660c3cb77 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/trainer_state.json @@ -0,0 +1,648 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 5.466666666666667, + "eval_steps": 10, + "global_step": 410, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.1980743408203125, + "learning_rate": 3.614814814814815e-05, + "loss": 0.9424, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.7190055847167969, + "eval_runtime": 43.8492, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 1.9299914836883545, + "learning_rate": 3.4962962962962965e-05, + "loss": 0.963, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 1.7638393640518188, + "eval_runtime": 43.8416, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.467031478881836, + "learning_rate": 3.377777777777778e-05, + "loss": 0.8679, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 1.8204401731491089, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.7744836807250977, + "learning_rate": 3.259259259259259e-05, + "loss": 0.9249, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 1.7969322204589844, + "eval_runtime": 43.8499, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 2.61295223236084, + "learning_rate": 3.140740740740741e-05, + "loss": 0.8338, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 1.800340175628662, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 410 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 6.71828951433216e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..9eef6820a97d9fa9878c07a86a27fabf333ae92c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:656c88f2faea82e931ee04f41dde5eecbc2d4d97fa6eab81116921c69427e46d +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..4bbb3a5e2122f8a4e59f2c63879d099a09ab2097 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4c195c52b70d2ea37efecb8353ef993645a1fe86ae23af028614446a841552c0 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..7c168ba589ab149907f65c12980a55da76890995 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:02f02c3c7264962c7bbb05c73c2c2f9530a34cf2c29d550cdc787ae19eb6d9bb +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..6b66301b7ac8ccf1308c1ac8d63d7000259489d4 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5336fb81030d9ecbffa34471d17a4c3981e781c865d7ff7a9b59e360e4230577 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..00d0ff4bd3aea34e81baf266b15592bed8adb9f2 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/trainer_state.json @@ -0,0 +1,663 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 5.6, + "eval_steps": 10, + "global_step": 420, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.1980743408203125, + "learning_rate": 3.614814814814815e-05, + "loss": 0.9424, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.7190055847167969, + "eval_runtime": 43.8492, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 1.9299914836883545, + "learning_rate": 3.4962962962962965e-05, + "loss": 0.963, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 1.7638393640518188, + "eval_runtime": 43.8416, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.467031478881836, + "learning_rate": 3.377777777777778e-05, + "loss": 0.8679, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 1.8204401731491089, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.7744836807250977, + "learning_rate": 3.259259259259259e-05, + "loss": 0.9249, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 1.7969322204589844, + "eval_runtime": 43.8499, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 2.61295223236084, + "learning_rate": 3.140740740740741e-05, + "loss": 0.8338, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 1.800340175628662, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.6296002864837646, + "learning_rate": 3.0222222222222225e-05, + "loss": 0.839, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 1.7916167974472046, + "eval_runtime": 43.8469, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 420 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 6.88215023419392e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..a3ccb8fcf546fe886d74d8c0b9d6c40e4381cd48 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:04e82b8370d28b76f11f9e5100144e76aff755917b02a0e3cae43619136532a2 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..53cdc2d39e18ac7e4dc91c86e30edf7c98091cdf --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:92586573e1cb859486e32952b5a38ca8aec0a31f0296ef7681511232fac478da +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..eb08c850753d158caff59458c0a4d2fa22ad5de8 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5f5c1faf0e9eb010c64f51b35236463635709da903fff7194839666558e862b6 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..e44e26be68a19106ac45dab84a43a732acb91528 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:579c34af7d7ec0609fbd3479f4f8d8571c4cef90c76d9f6bacf43740f58855d8 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..2e88a8015a3e63f3a3bc3c4df9b7a041c4731ba0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/trainer_state.json @@ -0,0 +1,678 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 5.733333333333333, + "eval_steps": 10, + "global_step": 430, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.1980743408203125, + "learning_rate": 3.614814814814815e-05, + "loss": 0.9424, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.7190055847167969, + "eval_runtime": 43.8492, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 1.9299914836883545, + "learning_rate": 3.4962962962962965e-05, + "loss": 0.963, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 1.7638393640518188, + "eval_runtime": 43.8416, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.467031478881836, + "learning_rate": 3.377777777777778e-05, + "loss": 0.8679, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 1.8204401731491089, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.7744836807250977, + "learning_rate": 3.259259259259259e-05, + "loss": 0.9249, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 1.7969322204589844, + "eval_runtime": 43.8499, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 2.61295223236084, + "learning_rate": 3.140740740740741e-05, + "loss": 0.8338, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 1.800340175628662, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.6296002864837646, + "learning_rate": 3.0222222222222225e-05, + "loss": 0.839, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 1.7916167974472046, + "eval_runtime": 43.8469, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.1177334785461426, + "learning_rate": 2.9037037037037042e-05, + "loss": 0.8077, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 1.800661325454712, + "eval_runtime": 43.8551, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 430 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 7.04601095405568e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..84f6a14cb25037e8ab718591e3a5e7af6f5a934b --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a91249a3a8b811024482cdf57f5821fc97db4e2e1c13b2bedd20fd46bac5513b +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..9539b34d96751f8b8209606b017113cecc9b0534 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f1d2c31ce2af8009ba4241965f82b9ec13fd4d3ac29f3bc812409b02aa883d31 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..5fdc5e50e381540856fecccc6c375074d1aa7b0a --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:54abee51bb88479cda4bf77e85c2a545e7fb3c5e42f56d1baa63f1344dcc0529 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..f8f2b85f23363ba098112683059a3e46233b6bfc --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:01a563f529b13f402d286b14bda74d3530e1fcecb2bee786164bfa1339da3729 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..1527fece1c7cd536cee12aa4043d1c109d4536aa --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/trainer_state.json @@ -0,0 +1,693 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 5.866666666666667, + "eval_steps": 10, + "global_step": 440, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.1980743408203125, + "learning_rate": 3.614814814814815e-05, + "loss": 0.9424, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.7190055847167969, + "eval_runtime": 43.8492, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 1.9299914836883545, + "learning_rate": 3.4962962962962965e-05, + "loss": 0.963, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 1.7638393640518188, + "eval_runtime": 43.8416, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.467031478881836, + "learning_rate": 3.377777777777778e-05, + "loss": 0.8679, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 1.8204401731491089, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.7744836807250977, + "learning_rate": 3.259259259259259e-05, + "loss": 0.9249, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 1.7969322204589844, + "eval_runtime": 43.8499, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 2.61295223236084, + "learning_rate": 3.140740740740741e-05, + "loss": 0.8338, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 1.800340175628662, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.6296002864837646, + "learning_rate": 3.0222222222222225e-05, + "loss": 0.839, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 1.7916167974472046, + "eval_runtime": 43.8469, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.1177334785461426, + "learning_rate": 2.9037037037037042e-05, + "loss": 0.8077, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 1.800661325454712, + "eval_runtime": 43.8551, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 2.4448559284210205, + "learning_rate": 2.7851851851851856e-05, + "loss": 0.8461, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 1.8001384735107422, + "eval_runtime": 43.8564, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 440 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 7.20987167391744e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..447ca9ebacfd293a91b1819849d45e97fcabb36b --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bfeeeff656b4275f702d060282a65f5dea8c33e7d02791d5ea98bb094913c1ba +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..522fe9fc3790cfd21a87357eecf96b231df8cd00 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0093325945904cf4c2b155fdedccc4bed715e50369142ff2817e262ede8c4005 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..3e7c44b011328e871a23ca1fea7cc6ea78d70a29 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4cc0a8131f9f14b855b33975c5e795a94be3a332a0f3cf68a9ec3ab6ce73b177 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..671e99d731836dff5ed479ba9e24ab368c795616 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bb8360cb66be4e8be27b2f376c800950e3f00449fb6491d6247165f9aff23820 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..7febfc3c66f7cddf3d572f19c991b50155ca72c4 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/trainer_state.json @@ -0,0 +1,708 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 6.0, + "eval_steps": 10, + "global_step": 450, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.1980743408203125, + "learning_rate": 3.614814814814815e-05, + "loss": 0.9424, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.7190055847167969, + "eval_runtime": 43.8492, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 1.9299914836883545, + "learning_rate": 3.4962962962962965e-05, + "loss": 0.963, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 1.7638393640518188, + "eval_runtime": 43.8416, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.467031478881836, + "learning_rate": 3.377777777777778e-05, + "loss": 0.8679, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 1.8204401731491089, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.7744836807250977, + "learning_rate": 3.259259259259259e-05, + "loss": 0.9249, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 1.7969322204589844, + "eval_runtime": 43.8499, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 2.61295223236084, + "learning_rate": 3.140740740740741e-05, + "loss": 0.8338, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 1.800340175628662, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.6296002864837646, + "learning_rate": 3.0222222222222225e-05, + "loss": 0.839, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 1.7916167974472046, + "eval_runtime": 43.8469, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.1177334785461426, + "learning_rate": 2.9037037037037042e-05, + "loss": 0.8077, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 1.800661325454712, + "eval_runtime": 43.8551, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 2.4448559284210205, + "learning_rate": 2.7851851851851856e-05, + "loss": 0.8461, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 1.8001384735107422, + "eval_runtime": 43.8564, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 2.592481851577759, + "learning_rate": 2.6666666666666667e-05, + "loss": 0.883, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 1.8040210008621216, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 450 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 7.3737323937792e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..d8475d3c95dd4e0c047ad754367b67188342276b --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d8ee0867e5da8e3f25dbf49bc22824916516427ee41294aad1367e68166471a2 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..e5f81aeb8ffdcfe093325e8d9eafd84d3e1ea154 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c299442d4284adb5fed3a2171541d06566d0dffc421e8952ba1ca0a09870c3ed +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..82f7415495fcd1c3ffb5dae79c8c3a4c2269faa6 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9a6424cc1a4d391795fbea6a94823363dca21ce0e7ec6c433e8cb5b0aca0060f +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..82f0764fa1ca7bd5d0d2c27e699e54f97149a9da --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:09ce894ec673ae7c851228a15e2e8a3dfc488203c01cbf434a7c4cbec9b7becb +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..21fcccd30d2a0986e9015ffe3ef2282c00ea6e49 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/trainer_state.json @@ -0,0 +1,723 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 6.133333333333334, + "eval_steps": 10, + "global_step": 460, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.1980743408203125, + "learning_rate": 3.614814814814815e-05, + "loss": 0.9424, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.7190055847167969, + "eval_runtime": 43.8492, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 1.9299914836883545, + "learning_rate": 3.4962962962962965e-05, + "loss": 0.963, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 1.7638393640518188, + "eval_runtime": 43.8416, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.467031478881836, + "learning_rate": 3.377777777777778e-05, + "loss": 0.8679, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 1.8204401731491089, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.7744836807250977, + "learning_rate": 3.259259259259259e-05, + "loss": 0.9249, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 1.7969322204589844, + "eval_runtime": 43.8499, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 2.61295223236084, + "learning_rate": 3.140740740740741e-05, + "loss": 0.8338, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 1.800340175628662, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.6296002864837646, + "learning_rate": 3.0222222222222225e-05, + "loss": 0.839, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 1.7916167974472046, + "eval_runtime": 43.8469, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.1177334785461426, + "learning_rate": 2.9037037037037042e-05, + "loss": 0.8077, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 1.800661325454712, + "eval_runtime": 43.8551, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 2.4448559284210205, + "learning_rate": 2.7851851851851856e-05, + "loss": 0.8461, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 1.8001384735107422, + "eval_runtime": 43.8564, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 2.592481851577759, + "learning_rate": 2.6666666666666667e-05, + "loss": 0.883, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 1.8040210008621216, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 2.9840338230133057, + "learning_rate": 2.5481481481481484e-05, + "loss": 0.7618, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 1.8591012954711914, + "eval_runtime": 43.8587, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 460 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 7.53759311364096e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..962d4afef4316e78f2282a4d9375b98a2a286bbb --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6a9adf888c956d8c9bd2b7570082f97e8c4cf7a05055146d5f5308fc40b952b6 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..dc2acd4efe03d7fd58fb416531266d7767a7ae98 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:347197a5d61ae20326bf862b1c3dfadd4fd201d3f1fa544ed32eede3bbe503e1 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..84ca1f63cf231e2aa1c43b465c46ef11c80bc867 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:03fc4a1860f68759a4d7833f4317681e377d4e71cf91ab1f091da8cd71579d26 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..9e4c1530ac9944d4b54caf372d4f9930c6597321 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ee66a0b6b4d05213664fc79a1ffd83a3bbefdb7154906787c3ef06bfdc4539f5 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..871eb3359700c75c3b92bc9d8a9bb6d81d522da6 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/trainer_state.json @@ -0,0 +1,738 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 6.266666666666667, + "eval_steps": 10, + "global_step": 470, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.1980743408203125, + "learning_rate": 3.614814814814815e-05, + "loss": 0.9424, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.7190055847167969, + "eval_runtime": 43.8492, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 1.9299914836883545, + "learning_rate": 3.4962962962962965e-05, + "loss": 0.963, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 1.7638393640518188, + "eval_runtime": 43.8416, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.467031478881836, + "learning_rate": 3.377777777777778e-05, + "loss": 0.8679, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 1.8204401731491089, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.7744836807250977, + "learning_rate": 3.259259259259259e-05, + "loss": 0.9249, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 1.7969322204589844, + "eval_runtime": 43.8499, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 2.61295223236084, + "learning_rate": 3.140740740740741e-05, + "loss": 0.8338, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 1.800340175628662, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.6296002864837646, + "learning_rate": 3.0222222222222225e-05, + "loss": 0.839, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 1.7916167974472046, + "eval_runtime": 43.8469, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.1177334785461426, + "learning_rate": 2.9037037037037042e-05, + "loss": 0.8077, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 1.800661325454712, + "eval_runtime": 43.8551, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 2.4448559284210205, + "learning_rate": 2.7851851851851856e-05, + "loss": 0.8461, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 1.8001384735107422, + "eval_runtime": 43.8564, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 2.592481851577759, + "learning_rate": 2.6666666666666667e-05, + "loss": 0.883, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 1.8040210008621216, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 2.9840338230133057, + "learning_rate": 2.5481481481481484e-05, + "loss": 0.7618, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 1.8591012954711914, + "eval_runtime": 43.8587, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.0177509784698486, + "learning_rate": 2.4296296296296298e-05, + "loss": 0.8613, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 1.8731619119644165, + "eval_runtime": 43.8558, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 470 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 7.70145383350272e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..fb63627e137b820b06308f0877b350348604e5a6 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4ae1459dca0a074673ba5e8ed44cca4249ad32d6b767d423fc2c4d415449d1a6 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..0fe341c0969bfe5cf00ca66bda34b5878b440ca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e81aaf29f495697f1a1b4572f33ea04e5afd1ae5f58372ffc106ab76da4fcb7b +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..302025be6f88ae472170fe5d230ba39d4ec976df --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:918d6ec8ede8d7a880512e2fc44b16d7c22df85e8b411a004d142edcf446c40d +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..9c7a583c3e236b2f110dd12004cef1d9a2b13311 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:535824b66976a8cd20163034000bf2ae1a203551ed6ea6132858b6421f4024c0 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..100809e6d8178ce20b10b7cd8ba9230e0f1b4861 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/trainer_state.json @@ -0,0 +1,753 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 6.4, + "eval_steps": 10, + "global_step": 480, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.1980743408203125, + "learning_rate": 3.614814814814815e-05, + "loss": 0.9424, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.7190055847167969, + "eval_runtime": 43.8492, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 1.9299914836883545, + "learning_rate": 3.4962962962962965e-05, + "loss": 0.963, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 1.7638393640518188, + "eval_runtime": 43.8416, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.467031478881836, + "learning_rate": 3.377777777777778e-05, + "loss": 0.8679, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 1.8204401731491089, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.7744836807250977, + "learning_rate": 3.259259259259259e-05, + "loss": 0.9249, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 1.7969322204589844, + "eval_runtime": 43.8499, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 2.61295223236084, + "learning_rate": 3.140740740740741e-05, + "loss": 0.8338, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 1.800340175628662, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.6296002864837646, + "learning_rate": 3.0222222222222225e-05, + "loss": 0.839, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 1.7916167974472046, + "eval_runtime": 43.8469, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.1177334785461426, + "learning_rate": 2.9037037037037042e-05, + "loss": 0.8077, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 1.800661325454712, + "eval_runtime": 43.8551, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 2.4448559284210205, + "learning_rate": 2.7851851851851856e-05, + "loss": 0.8461, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 1.8001384735107422, + "eval_runtime": 43.8564, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 2.592481851577759, + "learning_rate": 2.6666666666666667e-05, + "loss": 0.883, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 1.8040210008621216, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 2.9840338230133057, + "learning_rate": 2.5481481481481484e-05, + "loss": 0.7618, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 1.8591012954711914, + "eval_runtime": 43.8587, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.0177509784698486, + "learning_rate": 2.4296296296296298e-05, + "loss": 0.8613, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 1.8731619119644165, + "eval_runtime": 43.8558, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.9949898719787598, + "learning_rate": 2.3111111111111112e-05, + "loss": 0.716, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 1.8551521301269531, + "eval_runtime": 43.853, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 480 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 7.86531455336448e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..fe87cc20445e2050bbe71b750c573a73fb98043b --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:117bba99c1b8c8dee4f82d0514c15c3fe39ee15d1f15e4d5f5c169530166072a +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..14c330753381ac49ed5908c829bf1792cfc5f4ad --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:49ba3c191aa5039d99c544797045b7c4a6c364f768cd0e58d9c1adbc3714223f +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..031b265de35950a615eacc2c86e46292f552e541 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b56a3ff26dded8216d560cf73ba4817b5973851b78edbbf6aa9d6b515761df8c +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..5c509b1230d4d9d9bf05bb1cf38bcd2d3119d2c8 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:65f3e63eff29379b2f31d4f746c0c715c2b686bd11d7e07aba3d5f29231a18da +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..878f6941903ce4bf1b34d9e75edf7acf06eeba39 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/trainer_state.json @@ -0,0 +1,768 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 6.533333333333333, + "eval_steps": 10, + "global_step": 490, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.1980743408203125, + "learning_rate": 3.614814814814815e-05, + "loss": 0.9424, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.7190055847167969, + "eval_runtime": 43.8492, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 1.9299914836883545, + "learning_rate": 3.4962962962962965e-05, + "loss": 0.963, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 1.7638393640518188, + "eval_runtime": 43.8416, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.467031478881836, + "learning_rate": 3.377777777777778e-05, + "loss": 0.8679, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 1.8204401731491089, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.7744836807250977, + "learning_rate": 3.259259259259259e-05, + "loss": 0.9249, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 1.7969322204589844, + "eval_runtime": 43.8499, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 2.61295223236084, + "learning_rate": 3.140740740740741e-05, + "loss": 0.8338, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 1.800340175628662, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.6296002864837646, + "learning_rate": 3.0222222222222225e-05, + "loss": 0.839, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 1.7916167974472046, + "eval_runtime": 43.8469, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.1177334785461426, + "learning_rate": 2.9037037037037042e-05, + "loss": 0.8077, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 1.800661325454712, + "eval_runtime": 43.8551, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 2.4448559284210205, + "learning_rate": 2.7851851851851856e-05, + "loss": 0.8461, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 1.8001384735107422, + "eval_runtime": 43.8564, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 2.592481851577759, + "learning_rate": 2.6666666666666667e-05, + "loss": 0.883, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 1.8040210008621216, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 2.9840338230133057, + "learning_rate": 2.5481481481481484e-05, + "loss": 0.7618, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 1.8591012954711914, + "eval_runtime": 43.8587, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.0177509784698486, + "learning_rate": 2.4296296296296298e-05, + "loss": 0.8613, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 1.8731619119644165, + "eval_runtime": 43.8558, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.9949898719787598, + "learning_rate": 2.3111111111111112e-05, + "loss": 0.716, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 1.8551521301269531, + "eval_runtime": 43.853, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 4.913350582122803, + "learning_rate": 2.192592592592593e-05, + "loss": 0.7622, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 1.8626867532730103, + "eval_runtime": 43.855, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 490 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 8.02917527322624e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..0b14eca1d81e165104324143e35cb2abf9a7a12d --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3f806a91e2a4c8efc7fd916963acd9c939dd7a738be20ae79b37aaef4ed80634 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..f8f39899a8ed484e684f7522dc4497bb194b6bfe --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:00e098195dfa5537fd8e9a1858b5a2727b64cfac8213d2a5c6532fc0b885ae84 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..c1fc54eb4786e9f15244e8e4274b14688b87da5d --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7062fa0264c6fb17100531852b46c235ce631a6626d5e19749a65ba8723532c0 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..cee24f7781db565e483521e84ddc6dd277a07ef3 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8f79415c3ece613ed89d676bff22f42086790a2bced0de6758824fb8c7e27fcc +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..592da03a3a349a3281555f6312bf86fb8317d4a1 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/trainer_state.json @@ -0,0 +1,108 @@ +{ + "best_metric": 1.496050238609314, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50", + "epoch": 0.6666666666666666, + "eval_steps": 10, + "global_step": 50, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 8193035993088000.0, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..10bf0b2f8f922d479ba692dd82da995e72c75cbb --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:82e72f1d32df138adee953a7540457c15ddf6c1b9ce9e1107a8ca8974707ca5c +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..6c6ad94c10ff64f66a7c53ceb7d07a2d8e8620dd --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:40a684b7bdd76da8b2430bec99a57d036d743655ca50378b98d84c8d945ac3ad +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..96edd96602542afab3935d537c8d1428ce43196b --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:beda198a64f1e6f1db0895ff6a6859c2af4c98fbf9c15d1daa4dcca9c20f50be +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..36002f421a8027f0e22e1cea8d6c317eebfd0e2d --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e63d56828d52c149ac34c43bdc2adc48c363068c94b9a3df26528670b68d615b +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..ce65a4d6e769416b984487fff8fd1c14618cc8eb --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/trainer_state.json @@ -0,0 +1,783 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 6.666666666666667, + "eval_steps": 10, + "global_step": 500, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.1980743408203125, + "learning_rate": 3.614814814814815e-05, + "loss": 0.9424, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.7190055847167969, + "eval_runtime": 43.8492, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 1.9299914836883545, + "learning_rate": 3.4962962962962965e-05, + "loss": 0.963, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 1.7638393640518188, + "eval_runtime": 43.8416, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.467031478881836, + "learning_rate": 3.377777777777778e-05, + "loss": 0.8679, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 1.8204401731491089, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.7744836807250977, + "learning_rate": 3.259259259259259e-05, + "loss": 0.9249, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 1.7969322204589844, + "eval_runtime": 43.8499, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 2.61295223236084, + "learning_rate": 3.140740740740741e-05, + "loss": 0.8338, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 1.800340175628662, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.6296002864837646, + "learning_rate": 3.0222222222222225e-05, + "loss": 0.839, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 1.7916167974472046, + "eval_runtime": 43.8469, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.1177334785461426, + "learning_rate": 2.9037037037037042e-05, + "loss": 0.8077, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 1.800661325454712, + "eval_runtime": 43.8551, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 2.4448559284210205, + "learning_rate": 2.7851851851851856e-05, + "loss": 0.8461, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 1.8001384735107422, + "eval_runtime": 43.8564, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 2.592481851577759, + "learning_rate": 2.6666666666666667e-05, + "loss": 0.883, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 1.8040210008621216, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 2.9840338230133057, + "learning_rate": 2.5481481481481484e-05, + "loss": 0.7618, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 1.8591012954711914, + "eval_runtime": 43.8587, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.0177509784698486, + "learning_rate": 2.4296296296296298e-05, + "loss": 0.8613, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 1.8731619119644165, + "eval_runtime": 43.8558, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.9949898719787598, + "learning_rate": 2.3111111111111112e-05, + "loss": 0.716, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 1.8551521301269531, + "eval_runtime": 43.853, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 4.913350582122803, + "learning_rate": 2.192592592592593e-05, + "loss": 0.7622, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 1.8626867532730103, + "eval_runtime": 43.855, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 2.100435495376587, + "learning_rate": 2.074074074074074e-05, + "loss": 0.8401, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 1.8561443090438843, + "eval_runtime": 43.8617, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 500 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 8.193035993088e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..bf43a3dbbc70da4b695264a660508e8b2337fc38 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b6a49c9cf73b2a456465e9023ad2873a65f9f7db1d47587280fe9619f94d4429 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..19f7d5f13c211db4857a5832edcee736fde51867 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:16c63b3ba5a5d6950b9d6f3c855db1e16577fc14e499febd4c3b5a9f5ce27261 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..52b85f2bd42c764f793cd9aa8382577ad1b51617 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:156b16fe2af6b1592b431fe36919ba4914ab9e672f318f884f5045be66654277 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..a5298f7a45852e72ab3264eef95969ac26ee5012 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:14a2456b0fb437e597f1bc67f02d12ea64caadba3ce80e5a7bba56290d13a10e +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..119ac108d972f354e3d28ae59c485e03b315342b --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/trainer_state.json @@ -0,0 +1,798 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 6.8, + "eval_steps": 10, + "global_step": 510, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.1980743408203125, + "learning_rate": 3.614814814814815e-05, + "loss": 0.9424, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.7190055847167969, + "eval_runtime": 43.8492, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 1.9299914836883545, + "learning_rate": 3.4962962962962965e-05, + "loss": 0.963, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 1.7638393640518188, + "eval_runtime": 43.8416, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.467031478881836, + "learning_rate": 3.377777777777778e-05, + "loss": 0.8679, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 1.8204401731491089, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.7744836807250977, + "learning_rate": 3.259259259259259e-05, + "loss": 0.9249, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 1.7969322204589844, + "eval_runtime": 43.8499, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 2.61295223236084, + "learning_rate": 3.140740740740741e-05, + "loss": 0.8338, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 1.800340175628662, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.6296002864837646, + "learning_rate": 3.0222222222222225e-05, + "loss": 0.839, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 1.7916167974472046, + "eval_runtime": 43.8469, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.1177334785461426, + "learning_rate": 2.9037037037037042e-05, + "loss": 0.8077, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 1.800661325454712, + "eval_runtime": 43.8551, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 2.4448559284210205, + "learning_rate": 2.7851851851851856e-05, + "loss": 0.8461, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 1.8001384735107422, + "eval_runtime": 43.8564, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 2.592481851577759, + "learning_rate": 2.6666666666666667e-05, + "loss": 0.883, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 1.8040210008621216, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 2.9840338230133057, + "learning_rate": 2.5481481481481484e-05, + "loss": 0.7618, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 1.8591012954711914, + "eval_runtime": 43.8587, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.0177509784698486, + "learning_rate": 2.4296296296296298e-05, + "loss": 0.8613, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 1.8731619119644165, + "eval_runtime": 43.8558, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.9949898719787598, + "learning_rate": 2.3111111111111112e-05, + "loss": 0.716, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 1.8551521301269531, + "eval_runtime": 43.853, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 4.913350582122803, + "learning_rate": 2.192592592592593e-05, + "loss": 0.7622, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 1.8626867532730103, + "eval_runtime": 43.855, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 2.100435495376587, + "learning_rate": 2.074074074074074e-05, + "loss": 0.8401, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 1.8561443090438843, + "eval_runtime": 43.8617, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.0799033641815186, + "learning_rate": 1.9555555555555557e-05, + "loss": 0.7604, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 1.8695520162582397, + "eval_runtime": 43.8581, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 510 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 8.35689671294976e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..e4c91fa73dfeb474efa48af519bf79d3e2f5c6e9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4fad7d7c6be2f626294983111ebfb77a8ee3e419a70834bd9046c10f93092100 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..e81b779d9f425c421edb234ba37ba2a8ba293e10 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e561736c887767567075a74e6984d991ca21ee494d2ff9527f576850d7300cae +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..736afdcce42e3e1d5dec3aedeed239bc0b63975c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ca29f15bc2264125f00923607dbea007ec921af3e528271a2bb77db5cd4d2b66 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..d470aca3bc75a59cd83f65a7641e2227523184b0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:54453f7799a2c12a65729e49535ef0d1133252bbba34418ca96403f477d1ed92 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..7be318f3c650a9ce9f1ab941ffe4b3a29742cba8 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/trainer_state.json @@ -0,0 +1,813 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 6.933333333333334, + "eval_steps": 10, + "global_step": 520, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.1980743408203125, + "learning_rate": 3.614814814814815e-05, + "loss": 0.9424, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.7190055847167969, + "eval_runtime": 43.8492, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 1.9299914836883545, + "learning_rate": 3.4962962962962965e-05, + "loss": 0.963, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 1.7638393640518188, + "eval_runtime": 43.8416, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.467031478881836, + "learning_rate": 3.377777777777778e-05, + "loss": 0.8679, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 1.8204401731491089, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.7744836807250977, + "learning_rate": 3.259259259259259e-05, + "loss": 0.9249, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 1.7969322204589844, + "eval_runtime": 43.8499, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 2.61295223236084, + "learning_rate": 3.140740740740741e-05, + "loss": 0.8338, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 1.800340175628662, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.6296002864837646, + "learning_rate": 3.0222222222222225e-05, + "loss": 0.839, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 1.7916167974472046, + "eval_runtime": 43.8469, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.1177334785461426, + "learning_rate": 2.9037037037037042e-05, + "loss": 0.8077, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 1.800661325454712, + "eval_runtime": 43.8551, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 2.4448559284210205, + "learning_rate": 2.7851851851851856e-05, + "loss": 0.8461, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 1.8001384735107422, + "eval_runtime": 43.8564, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 2.592481851577759, + "learning_rate": 2.6666666666666667e-05, + "loss": 0.883, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 1.8040210008621216, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 2.9840338230133057, + "learning_rate": 2.5481481481481484e-05, + "loss": 0.7618, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 1.8591012954711914, + "eval_runtime": 43.8587, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.0177509784698486, + "learning_rate": 2.4296296296296298e-05, + "loss": 0.8613, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 1.8731619119644165, + "eval_runtime": 43.8558, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.9949898719787598, + "learning_rate": 2.3111111111111112e-05, + "loss": 0.716, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 1.8551521301269531, + "eval_runtime": 43.853, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 4.913350582122803, + "learning_rate": 2.192592592592593e-05, + "loss": 0.7622, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 1.8626867532730103, + "eval_runtime": 43.855, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 2.100435495376587, + "learning_rate": 2.074074074074074e-05, + "loss": 0.8401, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 1.8561443090438843, + "eval_runtime": 43.8617, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.0799033641815186, + "learning_rate": 1.9555555555555557e-05, + "loss": 0.7604, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 1.8695520162582397, + "eval_runtime": 43.8581, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 2.419006109237671, + "learning_rate": 1.837037037037037e-05, + "loss": 0.7594, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 1.858919382095337, + "eval_runtime": 43.8489, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 520 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 8.52075743281152e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..82deff2841289de1a62b96e280261502dcefb2f5 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cddef2c7d9af4c7ad07420005f9e5cede98a3222bbf3c5b8b439795abb0c83c3 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..698db757363874afe802fdbed4aa37c0f286f544 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:97d8d7e583c90db406f992d771c382e354015dfebdbb545f9c497d63cb7f5b9c +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..b0413aa128dc89fb63c7a74242ac1a6da3ecf5bf --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e9436217a6dd3838565d7b9845d97ff2e933eb514cc6ac99465ebc3448de3312 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..3038016ab1789281fcb7570057f9ac7ff03feda9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:67026c5b7b6af0a730215316d61a8dcdd8b26b784be7a50e23105aea365fc01d +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..feca0cd028f2025c560d656a7f3223c3896fced9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/trainer_state.json @@ -0,0 +1,828 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 7.066666666666666, + "eval_steps": 10, + "global_step": 530, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.1980743408203125, + "learning_rate": 3.614814814814815e-05, + "loss": 0.9424, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.7190055847167969, + "eval_runtime": 43.8492, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 1.9299914836883545, + "learning_rate": 3.4962962962962965e-05, + "loss": 0.963, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 1.7638393640518188, + "eval_runtime": 43.8416, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.467031478881836, + "learning_rate": 3.377777777777778e-05, + "loss": 0.8679, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 1.8204401731491089, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.7744836807250977, + "learning_rate": 3.259259259259259e-05, + "loss": 0.9249, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 1.7969322204589844, + "eval_runtime": 43.8499, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 2.61295223236084, + "learning_rate": 3.140740740740741e-05, + "loss": 0.8338, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 1.800340175628662, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.6296002864837646, + "learning_rate": 3.0222222222222225e-05, + "loss": 0.839, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 1.7916167974472046, + "eval_runtime": 43.8469, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.1177334785461426, + "learning_rate": 2.9037037037037042e-05, + "loss": 0.8077, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 1.800661325454712, + "eval_runtime": 43.8551, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 2.4448559284210205, + "learning_rate": 2.7851851851851856e-05, + "loss": 0.8461, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 1.8001384735107422, + "eval_runtime": 43.8564, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 2.592481851577759, + "learning_rate": 2.6666666666666667e-05, + "loss": 0.883, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 1.8040210008621216, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 2.9840338230133057, + "learning_rate": 2.5481481481481484e-05, + "loss": 0.7618, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 1.8591012954711914, + "eval_runtime": 43.8587, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.0177509784698486, + "learning_rate": 2.4296296296296298e-05, + "loss": 0.8613, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 1.8731619119644165, + "eval_runtime": 43.8558, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.9949898719787598, + "learning_rate": 2.3111111111111112e-05, + "loss": 0.716, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 1.8551521301269531, + "eval_runtime": 43.853, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 4.913350582122803, + "learning_rate": 2.192592592592593e-05, + "loss": 0.7622, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 1.8626867532730103, + "eval_runtime": 43.855, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 2.100435495376587, + "learning_rate": 2.074074074074074e-05, + "loss": 0.8401, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 1.8561443090438843, + "eval_runtime": 43.8617, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.0799033641815186, + "learning_rate": 1.9555555555555557e-05, + "loss": 0.7604, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 1.8695520162582397, + "eval_runtime": 43.8581, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 2.419006109237671, + "learning_rate": 1.837037037037037e-05, + "loss": 0.7594, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 1.858919382095337, + "eval_runtime": 43.8489, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 520 + }, + { + "epoch": 7.066666666666666, + "grad_norm": 2.5701613426208496, + "learning_rate": 1.7185185185185185e-05, + "loss": 0.8374, + "step": 530 + }, + { + "epoch": 7.066666666666666, + "eval_loss": 1.8769150972366333, + "eval_runtime": 43.8618, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 530 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 8.68461815267328e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..7811130dbba36f75ed79084eb465f45f26f11ce4 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d1040f20ceae7be2edc286252a318daea4c3c93f263aab8b811bff3bc6ad8586 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..c00f5ba4664067ee9b1a58506e904c7beaf3e341 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3c7568cbbc4c0ed0f26cc56ec8e701a37455ccb67afb85048c26dca96c3967c6 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..8d48caf21e655a01d7675a2b465c934cea676943 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:816bfad4f86e01da7fe3bd5bf7d10c902cf135a5b5fec9e0170158290fe5828c +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..4191ea1c11397f76dbbb9677283fd3b541b6e689 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:61e7bf31ab25b6a7b2f0902a2e1f6ca5545ad296580f627246378508da64fa41 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..2e50384e6344911d5aa0006daffdefd98bb49612 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/trainer_state.json @@ -0,0 +1,843 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 7.2, + "eval_steps": 10, + "global_step": 540, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.1980743408203125, + "learning_rate": 3.614814814814815e-05, + "loss": 0.9424, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.7190055847167969, + "eval_runtime": 43.8492, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 1.9299914836883545, + "learning_rate": 3.4962962962962965e-05, + "loss": 0.963, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 1.7638393640518188, + "eval_runtime": 43.8416, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.467031478881836, + "learning_rate": 3.377777777777778e-05, + "loss": 0.8679, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 1.8204401731491089, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.7744836807250977, + "learning_rate": 3.259259259259259e-05, + "loss": 0.9249, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 1.7969322204589844, + "eval_runtime": 43.8499, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 2.61295223236084, + "learning_rate": 3.140740740740741e-05, + "loss": 0.8338, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 1.800340175628662, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.6296002864837646, + "learning_rate": 3.0222222222222225e-05, + "loss": 0.839, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 1.7916167974472046, + "eval_runtime": 43.8469, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.1177334785461426, + "learning_rate": 2.9037037037037042e-05, + "loss": 0.8077, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 1.800661325454712, + "eval_runtime": 43.8551, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 2.4448559284210205, + "learning_rate": 2.7851851851851856e-05, + "loss": 0.8461, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 1.8001384735107422, + "eval_runtime": 43.8564, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 2.592481851577759, + "learning_rate": 2.6666666666666667e-05, + "loss": 0.883, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 1.8040210008621216, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 2.9840338230133057, + "learning_rate": 2.5481481481481484e-05, + "loss": 0.7618, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 1.8591012954711914, + "eval_runtime": 43.8587, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.0177509784698486, + "learning_rate": 2.4296296296296298e-05, + "loss": 0.8613, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 1.8731619119644165, + "eval_runtime": 43.8558, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.9949898719787598, + "learning_rate": 2.3111111111111112e-05, + "loss": 0.716, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 1.8551521301269531, + "eval_runtime": 43.853, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 4.913350582122803, + "learning_rate": 2.192592592592593e-05, + "loss": 0.7622, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 1.8626867532730103, + "eval_runtime": 43.855, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 2.100435495376587, + "learning_rate": 2.074074074074074e-05, + "loss": 0.8401, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 1.8561443090438843, + "eval_runtime": 43.8617, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.0799033641815186, + "learning_rate": 1.9555555555555557e-05, + "loss": 0.7604, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 1.8695520162582397, + "eval_runtime": 43.8581, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 2.419006109237671, + "learning_rate": 1.837037037037037e-05, + "loss": 0.7594, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 1.858919382095337, + "eval_runtime": 43.8489, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 520 + }, + { + "epoch": 7.066666666666666, + "grad_norm": 2.5701613426208496, + "learning_rate": 1.7185185185185185e-05, + "loss": 0.8374, + "step": 530 + }, + { + "epoch": 7.066666666666666, + "eval_loss": 1.8769150972366333, + "eval_runtime": 43.8618, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 530 + }, + { + "epoch": 7.2, + "grad_norm": 5.503419876098633, + "learning_rate": 1.6000000000000003e-05, + "loss": 0.7094, + "step": 540 + }, + { + "epoch": 7.2, + "eval_loss": 1.9237473011016846, + "eval_runtime": 43.8515, + "eval_samples_per_second": 22.804, + "eval_steps_per_second": 2.851, + "step": 540 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 8.84847887253504e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..62f2b911afe47d3a8cb68ac94df957bcae710364 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:795fc7b41d954c8d222fbc4b2a869c0b7d8cb68aff36d5ae5922dedcb75f43b1 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..fe50c721c6d8b229dae5ed537da7716d0715c7c9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2b43786ede72f66795d4c7a9c2b5f945a151d89f4b2b50b39157803748fd3975 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..9dc1ec111f2a6f7fbe8d878013e83df65b5f618a --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5b6faa8c50c89ce52c86274c8c795afb3f00524e7aef4544572df4b5b6b12c6d +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..f59d0454a2e540196c447dc81e215fde49e60f8d --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fe5b11bd9034273a78668f95788292b87ad00f4f53e9e4864d3471380b5838b8 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..78a492f0a669dd0a1f215259a9c692752591e1d4 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/trainer_state.json @@ -0,0 +1,858 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 7.333333333333333, + "eval_steps": 10, + "global_step": 550, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.1980743408203125, + "learning_rate": 3.614814814814815e-05, + "loss": 0.9424, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.7190055847167969, + "eval_runtime": 43.8492, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 1.9299914836883545, + "learning_rate": 3.4962962962962965e-05, + "loss": 0.963, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 1.7638393640518188, + "eval_runtime": 43.8416, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.467031478881836, + "learning_rate": 3.377777777777778e-05, + "loss": 0.8679, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 1.8204401731491089, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.7744836807250977, + "learning_rate": 3.259259259259259e-05, + "loss": 0.9249, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 1.7969322204589844, + "eval_runtime": 43.8499, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 2.61295223236084, + "learning_rate": 3.140740740740741e-05, + "loss": 0.8338, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 1.800340175628662, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.6296002864837646, + "learning_rate": 3.0222222222222225e-05, + "loss": 0.839, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 1.7916167974472046, + "eval_runtime": 43.8469, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.1177334785461426, + "learning_rate": 2.9037037037037042e-05, + "loss": 0.8077, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 1.800661325454712, + "eval_runtime": 43.8551, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 2.4448559284210205, + "learning_rate": 2.7851851851851856e-05, + "loss": 0.8461, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 1.8001384735107422, + "eval_runtime": 43.8564, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 2.592481851577759, + "learning_rate": 2.6666666666666667e-05, + "loss": 0.883, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 1.8040210008621216, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 2.9840338230133057, + "learning_rate": 2.5481481481481484e-05, + "loss": 0.7618, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 1.8591012954711914, + "eval_runtime": 43.8587, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.0177509784698486, + "learning_rate": 2.4296296296296298e-05, + "loss": 0.8613, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 1.8731619119644165, + "eval_runtime": 43.8558, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.9949898719787598, + "learning_rate": 2.3111111111111112e-05, + "loss": 0.716, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 1.8551521301269531, + "eval_runtime": 43.853, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 4.913350582122803, + "learning_rate": 2.192592592592593e-05, + "loss": 0.7622, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 1.8626867532730103, + "eval_runtime": 43.855, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 2.100435495376587, + "learning_rate": 2.074074074074074e-05, + "loss": 0.8401, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 1.8561443090438843, + "eval_runtime": 43.8617, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.0799033641815186, + "learning_rate": 1.9555555555555557e-05, + "loss": 0.7604, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 1.8695520162582397, + "eval_runtime": 43.8581, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 2.419006109237671, + "learning_rate": 1.837037037037037e-05, + "loss": 0.7594, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 1.858919382095337, + "eval_runtime": 43.8489, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 520 + }, + { + "epoch": 7.066666666666666, + "grad_norm": 2.5701613426208496, + "learning_rate": 1.7185185185185185e-05, + "loss": 0.8374, + "step": 530 + }, + { + "epoch": 7.066666666666666, + "eval_loss": 1.8769150972366333, + "eval_runtime": 43.8618, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 530 + }, + { + "epoch": 7.2, + "grad_norm": 5.503419876098633, + "learning_rate": 1.6000000000000003e-05, + "loss": 0.7094, + "step": 540 + }, + { + "epoch": 7.2, + "eval_loss": 1.9237473011016846, + "eval_runtime": 43.8515, + "eval_samples_per_second": 22.804, + "eval_steps_per_second": 2.851, + "step": 540 + }, + { + "epoch": 7.333333333333333, + "grad_norm": 2.483323574066162, + "learning_rate": 1.4814814814814815e-05, + "loss": 0.692, + "step": 550 + }, + { + "epoch": 7.333333333333333, + "eval_loss": 1.9152519702911377, + "eval_runtime": 43.8396, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 550 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 9.0123395923968e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..a29fbab5753406a5a774733291d22d63e19358b4 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:356a149dc9e5142b58b74caf23f05f85989366f0e9054792d31b3341ce093a0a +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..76bd20f1d90697279821491b4173b0948674b414 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:68bdbc2ceecb95b0da5788854ced0654c489c9432d240f33366c3eb559ab37f8 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..75311ff97c8628cb71fe6f6cdca5e9e1127d30b6 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3b6745ab2a92f54dcacb73c3ceec9d54235e5b225134fb7703879ee6185ad897 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..ffa4f4faa1638037c7009a7874a8ec2f958a56f3 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:01b0d6dead233f71cda974ae02165d32469a3692fb9b97739fca51d1798a012e +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..2c11f94afc1d09461703ad0e4e608159972cb050 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/trainer_state.json @@ -0,0 +1,873 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 7.466666666666667, + "eval_steps": 10, + "global_step": 560, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.1980743408203125, + "learning_rate": 3.614814814814815e-05, + "loss": 0.9424, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.7190055847167969, + "eval_runtime": 43.8492, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 1.9299914836883545, + "learning_rate": 3.4962962962962965e-05, + "loss": 0.963, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 1.7638393640518188, + "eval_runtime": 43.8416, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.467031478881836, + "learning_rate": 3.377777777777778e-05, + "loss": 0.8679, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 1.8204401731491089, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.7744836807250977, + "learning_rate": 3.259259259259259e-05, + "loss": 0.9249, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 1.7969322204589844, + "eval_runtime": 43.8499, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 2.61295223236084, + "learning_rate": 3.140740740740741e-05, + "loss": 0.8338, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 1.800340175628662, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.6296002864837646, + "learning_rate": 3.0222222222222225e-05, + "loss": 0.839, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 1.7916167974472046, + "eval_runtime": 43.8469, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.1177334785461426, + "learning_rate": 2.9037037037037042e-05, + "loss": 0.8077, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 1.800661325454712, + "eval_runtime": 43.8551, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 2.4448559284210205, + "learning_rate": 2.7851851851851856e-05, + "loss": 0.8461, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 1.8001384735107422, + "eval_runtime": 43.8564, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 2.592481851577759, + "learning_rate": 2.6666666666666667e-05, + "loss": 0.883, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 1.8040210008621216, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 2.9840338230133057, + "learning_rate": 2.5481481481481484e-05, + "loss": 0.7618, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 1.8591012954711914, + "eval_runtime": 43.8587, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.0177509784698486, + "learning_rate": 2.4296296296296298e-05, + "loss": 0.8613, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 1.8731619119644165, + "eval_runtime": 43.8558, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.9949898719787598, + "learning_rate": 2.3111111111111112e-05, + "loss": 0.716, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 1.8551521301269531, + "eval_runtime": 43.853, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 4.913350582122803, + "learning_rate": 2.192592592592593e-05, + "loss": 0.7622, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 1.8626867532730103, + "eval_runtime": 43.855, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 2.100435495376587, + "learning_rate": 2.074074074074074e-05, + "loss": 0.8401, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 1.8561443090438843, + "eval_runtime": 43.8617, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.0799033641815186, + "learning_rate": 1.9555555555555557e-05, + "loss": 0.7604, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 1.8695520162582397, + "eval_runtime": 43.8581, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 2.419006109237671, + "learning_rate": 1.837037037037037e-05, + "loss": 0.7594, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 1.858919382095337, + "eval_runtime": 43.8489, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 520 + }, + { + "epoch": 7.066666666666666, + "grad_norm": 2.5701613426208496, + "learning_rate": 1.7185185185185185e-05, + "loss": 0.8374, + "step": 530 + }, + { + "epoch": 7.066666666666666, + "eval_loss": 1.8769150972366333, + "eval_runtime": 43.8618, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 530 + }, + { + "epoch": 7.2, + "grad_norm": 5.503419876098633, + "learning_rate": 1.6000000000000003e-05, + "loss": 0.7094, + "step": 540 + }, + { + "epoch": 7.2, + "eval_loss": 1.9237473011016846, + "eval_runtime": 43.8515, + "eval_samples_per_second": 22.804, + "eval_steps_per_second": 2.851, + "step": 540 + }, + { + "epoch": 7.333333333333333, + "grad_norm": 2.483323574066162, + "learning_rate": 1.4814814814814815e-05, + "loss": 0.692, + "step": 550 + }, + { + "epoch": 7.333333333333333, + "eval_loss": 1.9152519702911377, + "eval_runtime": 43.8396, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 550 + }, + { + "epoch": 7.466666666666667, + "grad_norm": 3.480727195739746, + "learning_rate": 1.362962962962963e-05, + "loss": 0.6614, + "step": 560 + }, + { + "epoch": 7.466666666666667, + "eval_loss": 1.9116826057434082, + "eval_runtime": 43.8448, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 560 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 9.17620031225856e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..59dc76c25827ef37c8721421d8c0c241ac9e3931 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5909b87e1006a1a42045aaff2444a39584580a94389aaab0b4615ff2afc54f2a +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..9be4ea67c5617b960f5bb46d76ca2fa35d4c28ed --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:76f0db820f28c133a7c34cfb33bbc74696c0140e21547781230f746058a670f7 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..3ed38f9a78b3dbf6f2e73e5bd68681ac198b1983 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d966d92a47b281ed57ee7f44ee2eaa60a54786f7ca9b7e8829ab8723bc8a5a1d +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..0f6ab5a4a6c1c8537d29396d68ff9a943067c8eb --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6a34ac3e6b737b225204c7a1c95f58427255f84b0986866cdac344b9d5ba4319 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..8bbee6ed3b6864bb37ea40bb99b0727be8c2cf90 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/trainer_state.json @@ -0,0 +1,888 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 7.6, + "eval_steps": 10, + "global_step": 570, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.1980743408203125, + "learning_rate": 3.614814814814815e-05, + "loss": 0.9424, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.7190055847167969, + "eval_runtime": 43.8492, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 1.9299914836883545, + "learning_rate": 3.4962962962962965e-05, + "loss": 0.963, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 1.7638393640518188, + "eval_runtime": 43.8416, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.467031478881836, + "learning_rate": 3.377777777777778e-05, + "loss": 0.8679, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 1.8204401731491089, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.7744836807250977, + "learning_rate": 3.259259259259259e-05, + "loss": 0.9249, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 1.7969322204589844, + "eval_runtime": 43.8499, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 2.61295223236084, + "learning_rate": 3.140740740740741e-05, + "loss": 0.8338, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 1.800340175628662, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.6296002864837646, + "learning_rate": 3.0222222222222225e-05, + "loss": 0.839, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 1.7916167974472046, + "eval_runtime": 43.8469, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.1177334785461426, + "learning_rate": 2.9037037037037042e-05, + "loss": 0.8077, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 1.800661325454712, + "eval_runtime": 43.8551, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 2.4448559284210205, + "learning_rate": 2.7851851851851856e-05, + "loss": 0.8461, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 1.8001384735107422, + "eval_runtime": 43.8564, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 2.592481851577759, + "learning_rate": 2.6666666666666667e-05, + "loss": 0.883, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 1.8040210008621216, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 2.9840338230133057, + "learning_rate": 2.5481481481481484e-05, + "loss": 0.7618, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 1.8591012954711914, + "eval_runtime": 43.8587, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.0177509784698486, + "learning_rate": 2.4296296296296298e-05, + "loss": 0.8613, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 1.8731619119644165, + "eval_runtime": 43.8558, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.9949898719787598, + "learning_rate": 2.3111111111111112e-05, + "loss": 0.716, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 1.8551521301269531, + "eval_runtime": 43.853, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 4.913350582122803, + "learning_rate": 2.192592592592593e-05, + "loss": 0.7622, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 1.8626867532730103, + "eval_runtime": 43.855, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 2.100435495376587, + "learning_rate": 2.074074074074074e-05, + "loss": 0.8401, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 1.8561443090438843, + "eval_runtime": 43.8617, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.0799033641815186, + "learning_rate": 1.9555555555555557e-05, + "loss": 0.7604, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 1.8695520162582397, + "eval_runtime": 43.8581, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 2.419006109237671, + "learning_rate": 1.837037037037037e-05, + "loss": 0.7594, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 1.858919382095337, + "eval_runtime": 43.8489, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 520 + }, + { + "epoch": 7.066666666666666, + "grad_norm": 2.5701613426208496, + "learning_rate": 1.7185185185185185e-05, + "loss": 0.8374, + "step": 530 + }, + { + "epoch": 7.066666666666666, + "eval_loss": 1.8769150972366333, + "eval_runtime": 43.8618, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 530 + }, + { + "epoch": 7.2, + "grad_norm": 5.503419876098633, + "learning_rate": 1.6000000000000003e-05, + "loss": 0.7094, + "step": 540 + }, + { + "epoch": 7.2, + "eval_loss": 1.9237473011016846, + "eval_runtime": 43.8515, + "eval_samples_per_second": 22.804, + "eval_steps_per_second": 2.851, + "step": 540 + }, + { + "epoch": 7.333333333333333, + "grad_norm": 2.483323574066162, + "learning_rate": 1.4814814814814815e-05, + "loss": 0.692, + "step": 550 + }, + { + "epoch": 7.333333333333333, + "eval_loss": 1.9152519702911377, + "eval_runtime": 43.8396, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 550 + }, + { + "epoch": 7.466666666666667, + "grad_norm": 3.480727195739746, + "learning_rate": 1.362962962962963e-05, + "loss": 0.6614, + "step": 560 + }, + { + "epoch": 7.466666666666667, + "eval_loss": 1.9116826057434082, + "eval_runtime": 43.8448, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 560 + }, + { + "epoch": 7.6, + "grad_norm": 3.9647531509399414, + "learning_rate": 1.2444444444444446e-05, + "loss": 0.8231, + "step": 570 + }, + { + "epoch": 7.6, + "eval_loss": 1.9044451713562012, + "eval_runtime": 43.8427, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 570 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 9.34006103212032e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..5209f5c0ca1b7146c005d86c4e2ecf2dbf99cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:494450696f016076c36f8a1a24d1baf47f82eccaa889b76f19e0b1e8449e56b1 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..3dbe73ebfb1908cddaa6e05fbf087a5e37555ac4 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b180c1306aa858697d071fc99f658fb515fb3b8ff8bfdd0b5fd52b1844e618f8 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..6f12baaba3ec135e726e0b75dc20ee8cfe8a995d --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:55a6ddc6425602c9554969e2910a1ee66847f95ab8fd86352843e16c6530b2c0 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..df177b9452bbc35cb78b91089f310520fe740b94 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3aa3352ae201120fa831c764f5b07fe3f9aa427e68763e4c88ed9af407727f22 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..9e1769b0ddad4ada5ed079c468034bf1b89bbb7b --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/trainer_state.json @@ -0,0 +1,903 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 7.733333333333333, + "eval_steps": 10, + "global_step": 580, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.1980743408203125, + "learning_rate": 3.614814814814815e-05, + "loss": 0.9424, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.7190055847167969, + "eval_runtime": 43.8492, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 1.9299914836883545, + "learning_rate": 3.4962962962962965e-05, + "loss": 0.963, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 1.7638393640518188, + "eval_runtime": 43.8416, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.467031478881836, + "learning_rate": 3.377777777777778e-05, + "loss": 0.8679, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 1.8204401731491089, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.7744836807250977, + "learning_rate": 3.259259259259259e-05, + "loss": 0.9249, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 1.7969322204589844, + "eval_runtime": 43.8499, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 2.61295223236084, + "learning_rate": 3.140740740740741e-05, + "loss": 0.8338, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 1.800340175628662, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.6296002864837646, + "learning_rate": 3.0222222222222225e-05, + "loss": 0.839, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 1.7916167974472046, + "eval_runtime": 43.8469, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.1177334785461426, + "learning_rate": 2.9037037037037042e-05, + "loss": 0.8077, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 1.800661325454712, + "eval_runtime": 43.8551, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 2.4448559284210205, + "learning_rate": 2.7851851851851856e-05, + "loss": 0.8461, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 1.8001384735107422, + "eval_runtime": 43.8564, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 2.592481851577759, + "learning_rate": 2.6666666666666667e-05, + "loss": 0.883, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 1.8040210008621216, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 2.9840338230133057, + "learning_rate": 2.5481481481481484e-05, + "loss": 0.7618, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 1.8591012954711914, + "eval_runtime": 43.8587, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.0177509784698486, + "learning_rate": 2.4296296296296298e-05, + "loss": 0.8613, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 1.8731619119644165, + "eval_runtime": 43.8558, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.9949898719787598, + "learning_rate": 2.3111111111111112e-05, + "loss": 0.716, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 1.8551521301269531, + "eval_runtime": 43.853, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 4.913350582122803, + "learning_rate": 2.192592592592593e-05, + "loss": 0.7622, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 1.8626867532730103, + "eval_runtime": 43.855, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 2.100435495376587, + "learning_rate": 2.074074074074074e-05, + "loss": 0.8401, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 1.8561443090438843, + "eval_runtime": 43.8617, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.0799033641815186, + "learning_rate": 1.9555555555555557e-05, + "loss": 0.7604, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 1.8695520162582397, + "eval_runtime": 43.8581, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 2.419006109237671, + "learning_rate": 1.837037037037037e-05, + "loss": 0.7594, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 1.858919382095337, + "eval_runtime": 43.8489, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 520 + }, + { + "epoch": 7.066666666666666, + "grad_norm": 2.5701613426208496, + "learning_rate": 1.7185185185185185e-05, + "loss": 0.8374, + "step": 530 + }, + { + "epoch": 7.066666666666666, + "eval_loss": 1.8769150972366333, + "eval_runtime": 43.8618, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 530 + }, + { + "epoch": 7.2, + "grad_norm": 5.503419876098633, + "learning_rate": 1.6000000000000003e-05, + "loss": 0.7094, + "step": 540 + }, + { + "epoch": 7.2, + "eval_loss": 1.9237473011016846, + "eval_runtime": 43.8515, + "eval_samples_per_second": 22.804, + "eval_steps_per_second": 2.851, + "step": 540 + }, + { + "epoch": 7.333333333333333, + "grad_norm": 2.483323574066162, + "learning_rate": 1.4814814814814815e-05, + "loss": 0.692, + "step": 550 + }, + { + "epoch": 7.333333333333333, + "eval_loss": 1.9152519702911377, + "eval_runtime": 43.8396, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 550 + }, + { + "epoch": 7.466666666666667, + "grad_norm": 3.480727195739746, + "learning_rate": 1.362962962962963e-05, + "loss": 0.6614, + "step": 560 + }, + { + "epoch": 7.466666666666667, + "eval_loss": 1.9116826057434082, + "eval_runtime": 43.8448, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 560 + }, + { + "epoch": 7.6, + "grad_norm": 3.9647531509399414, + "learning_rate": 1.2444444444444446e-05, + "loss": 0.8231, + "step": 570 + }, + { + "epoch": 7.6, + "eval_loss": 1.9044451713562012, + "eval_runtime": 43.8427, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 570 + }, + { + "epoch": 7.733333333333333, + "grad_norm": 3.6963553428649902, + "learning_rate": 1.125925925925926e-05, + "loss": 0.7385, + "step": 580 + }, + { + "epoch": 7.733333333333333, + "eval_loss": 1.9102691411972046, + "eval_runtime": 43.8393, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 580 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 9.50392175198208e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..543cf3998264bb4bfb97c7f9b4ec68b7f406f9fa --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:65b6fb7b251c8b589941b3fff23fa13672e01249cedc7aad1a8b7b217abedfcb +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..a27d0938dc1327ba573ee2caa3156b8232deeec2 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8875e78ad1e3193b22706f4b9860fae59e51f2d968bb43e9821b27ba81ae8d77 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..f2cbe02e4922a4920c0a827f09f6df580967beb0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c5704b322a17ce5b2788c1247543e3ca9edc36d083fd8ecc8ca80d04334c6030 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..e93107ab0b0cdc649d183c879754ed083006f9d7 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c640cac3d338c5c53c53ad351f9ec822b97e3962fe58e3c4439d6cecd03512ac +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..059f53d94e5bd318e3c03570154765c94d783b89 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/trainer_state.json @@ -0,0 +1,918 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 7.866666666666667, + "eval_steps": 10, + "global_step": 590, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.1980743408203125, + "learning_rate": 3.614814814814815e-05, + "loss": 0.9424, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.7190055847167969, + "eval_runtime": 43.8492, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 1.9299914836883545, + "learning_rate": 3.4962962962962965e-05, + "loss": 0.963, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 1.7638393640518188, + "eval_runtime": 43.8416, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.467031478881836, + "learning_rate": 3.377777777777778e-05, + "loss": 0.8679, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 1.8204401731491089, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.7744836807250977, + "learning_rate": 3.259259259259259e-05, + "loss": 0.9249, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 1.7969322204589844, + "eval_runtime": 43.8499, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 2.61295223236084, + "learning_rate": 3.140740740740741e-05, + "loss": 0.8338, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 1.800340175628662, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.6296002864837646, + "learning_rate": 3.0222222222222225e-05, + "loss": 0.839, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 1.7916167974472046, + "eval_runtime": 43.8469, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.1177334785461426, + "learning_rate": 2.9037037037037042e-05, + "loss": 0.8077, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 1.800661325454712, + "eval_runtime": 43.8551, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 2.4448559284210205, + "learning_rate": 2.7851851851851856e-05, + "loss": 0.8461, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 1.8001384735107422, + "eval_runtime": 43.8564, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 2.592481851577759, + "learning_rate": 2.6666666666666667e-05, + "loss": 0.883, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 1.8040210008621216, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 2.9840338230133057, + "learning_rate": 2.5481481481481484e-05, + "loss": 0.7618, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 1.8591012954711914, + "eval_runtime": 43.8587, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.0177509784698486, + "learning_rate": 2.4296296296296298e-05, + "loss": 0.8613, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 1.8731619119644165, + "eval_runtime": 43.8558, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.9949898719787598, + "learning_rate": 2.3111111111111112e-05, + "loss": 0.716, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 1.8551521301269531, + "eval_runtime": 43.853, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 4.913350582122803, + "learning_rate": 2.192592592592593e-05, + "loss": 0.7622, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 1.8626867532730103, + "eval_runtime": 43.855, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 2.100435495376587, + "learning_rate": 2.074074074074074e-05, + "loss": 0.8401, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 1.8561443090438843, + "eval_runtime": 43.8617, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.0799033641815186, + "learning_rate": 1.9555555555555557e-05, + "loss": 0.7604, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 1.8695520162582397, + "eval_runtime": 43.8581, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 2.419006109237671, + "learning_rate": 1.837037037037037e-05, + "loss": 0.7594, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 1.858919382095337, + "eval_runtime": 43.8489, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 520 + }, + { + "epoch": 7.066666666666666, + "grad_norm": 2.5701613426208496, + "learning_rate": 1.7185185185185185e-05, + "loss": 0.8374, + "step": 530 + }, + { + "epoch": 7.066666666666666, + "eval_loss": 1.8769150972366333, + "eval_runtime": 43.8618, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 530 + }, + { + "epoch": 7.2, + "grad_norm": 5.503419876098633, + "learning_rate": 1.6000000000000003e-05, + "loss": 0.7094, + "step": 540 + }, + { + "epoch": 7.2, + "eval_loss": 1.9237473011016846, + "eval_runtime": 43.8515, + "eval_samples_per_second": 22.804, + "eval_steps_per_second": 2.851, + "step": 540 + }, + { + "epoch": 7.333333333333333, + "grad_norm": 2.483323574066162, + "learning_rate": 1.4814814814814815e-05, + "loss": 0.692, + "step": 550 + }, + { + "epoch": 7.333333333333333, + "eval_loss": 1.9152519702911377, + "eval_runtime": 43.8396, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 550 + }, + { + "epoch": 7.466666666666667, + "grad_norm": 3.480727195739746, + "learning_rate": 1.362962962962963e-05, + "loss": 0.6614, + "step": 560 + }, + { + "epoch": 7.466666666666667, + "eval_loss": 1.9116826057434082, + "eval_runtime": 43.8448, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 560 + }, + { + "epoch": 7.6, + "grad_norm": 3.9647531509399414, + "learning_rate": 1.2444444444444446e-05, + "loss": 0.8231, + "step": 570 + }, + { + "epoch": 7.6, + "eval_loss": 1.9044451713562012, + "eval_runtime": 43.8427, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 570 + }, + { + "epoch": 7.733333333333333, + "grad_norm": 3.6963553428649902, + "learning_rate": 1.125925925925926e-05, + "loss": 0.7385, + "step": 580 + }, + { + "epoch": 7.733333333333333, + "eval_loss": 1.9102691411972046, + "eval_runtime": 43.8393, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 580 + }, + { + "epoch": 7.866666666666667, + "grad_norm": 2.862262725830078, + "learning_rate": 1.0074074074074074e-05, + "loss": 0.7405, + "step": 590 + }, + { + "epoch": 7.866666666666667, + "eval_loss": 1.9190014600753784, + "eval_runtime": 43.8405, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 590 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 9.66778247184384e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..e3755df10b915b34d6a6498699d6fe176027794e --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a50523c87f02d2bddc7cd31daa155bdf2a836e57ad797a56aeccaa3df36ebb8e +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..ac7c8f5ead998553a2396680069798ae2c81dab9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:edb61ee69352d95ac874c0a48943adde7479e38d72508b9e241bcaa5df70dc83 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..3d041c10a3af80c2be01488b87e7c23a107acab4 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:224b98cd2a3813f8f156af229101dde99ced2e24294f3d7ad7b1538fdc49c27c +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..e35866f32db88c57fbcc281885df929786abae39 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:db64dfcaaa6d2770fdeb8c6c250f6efda7e6b2cbc236d50bf153703fcb63ac50 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..81ea0c0cb46a37229dd52443f5d74d1acd2d2e45 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/trainer_state.json @@ -0,0 +1,123 @@ +{ + "best_metric": 1.4949277639389038, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60", + "epoch": 0.8, + "eval_steps": 10, + "global_step": 60, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 9831643191705600.0, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..212c54bda1a79daa9482c03eecf9c67ac7902b00 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8d63757df321652f70b7b0f6adf15a37df078e07422a0b6f6b2e410f59d7fcee +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..0969b2ae62041a75c7b51d98320a67dc2cd43a4a --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ef3fc061de468a4034fec870a77b43510609eed0c89fc7f2274b1e930d42b2e3 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..ef40b259bc3233779099c3b8651c2fe0a9d07fa5 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9bbc772ea5a37ab482a5fa0d13a2014584215ee3da6246ff6fe50fb8dafbfb8e +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..b706276d2ff18ccd83310c61d87eb2ed9fc15f80 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:efa99c3d03a71b7b58bf8c6b52c8cd63b4d6a19d88cbdc8dfd20580671d183cb +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..968eb886bb56182430d1ae3c03d97fd3ddf966e9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/trainer_state.json @@ -0,0 +1,933 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 8.0, + "eval_steps": 10, + "global_step": 600, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.1980743408203125, + "learning_rate": 3.614814814814815e-05, + "loss": 0.9424, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.7190055847167969, + "eval_runtime": 43.8492, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 1.9299914836883545, + "learning_rate": 3.4962962962962965e-05, + "loss": 0.963, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 1.7638393640518188, + "eval_runtime": 43.8416, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.467031478881836, + "learning_rate": 3.377777777777778e-05, + "loss": 0.8679, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 1.8204401731491089, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.7744836807250977, + "learning_rate": 3.259259259259259e-05, + "loss": 0.9249, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 1.7969322204589844, + "eval_runtime": 43.8499, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 2.61295223236084, + "learning_rate": 3.140740740740741e-05, + "loss": 0.8338, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 1.800340175628662, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.6296002864837646, + "learning_rate": 3.0222222222222225e-05, + "loss": 0.839, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 1.7916167974472046, + "eval_runtime": 43.8469, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.1177334785461426, + "learning_rate": 2.9037037037037042e-05, + "loss": 0.8077, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 1.800661325454712, + "eval_runtime": 43.8551, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 2.4448559284210205, + "learning_rate": 2.7851851851851856e-05, + "loss": 0.8461, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 1.8001384735107422, + "eval_runtime": 43.8564, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 2.592481851577759, + "learning_rate": 2.6666666666666667e-05, + "loss": 0.883, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 1.8040210008621216, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 2.9840338230133057, + "learning_rate": 2.5481481481481484e-05, + "loss": 0.7618, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 1.8591012954711914, + "eval_runtime": 43.8587, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.0177509784698486, + "learning_rate": 2.4296296296296298e-05, + "loss": 0.8613, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 1.8731619119644165, + "eval_runtime": 43.8558, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.9949898719787598, + "learning_rate": 2.3111111111111112e-05, + "loss": 0.716, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 1.8551521301269531, + "eval_runtime": 43.853, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 4.913350582122803, + "learning_rate": 2.192592592592593e-05, + "loss": 0.7622, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 1.8626867532730103, + "eval_runtime": 43.855, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 2.100435495376587, + "learning_rate": 2.074074074074074e-05, + "loss": 0.8401, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 1.8561443090438843, + "eval_runtime": 43.8617, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.0799033641815186, + "learning_rate": 1.9555555555555557e-05, + "loss": 0.7604, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 1.8695520162582397, + "eval_runtime": 43.8581, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 2.419006109237671, + "learning_rate": 1.837037037037037e-05, + "loss": 0.7594, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 1.858919382095337, + "eval_runtime": 43.8489, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 520 + }, + { + "epoch": 7.066666666666666, + "grad_norm": 2.5701613426208496, + "learning_rate": 1.7185185185185185e-05, + "loss": 0.8374, + "step": 530 + }, + { + "epoch": 7.066666666666666, + "eval_loss": 1.8769150972366333, + "eval_runtime": 43.8618, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 530 + }, + { + "epoch": 7.2, + "grad_norm": 5.503419876098633, + "learning_rate": 1.6000000000000003e-05, + "loss": 0.7094, + "step": 540 + }, + { + "epoch": 7.2, + "eval_loss": 1.9237473011016846, + "eval_runtime": 43.8515, + "eval_samples_per_second": 22.804, + "eval_steps_per_second": 2.851, + "step": 540 + }, + { + "epoch": 7.333333333333333, + "grad_norm": 2.483323574066162, + "learning_rate": 1.4814814814814815e-05, + "loss": 0.692, + "step": 550 + }, + { + "epoch": 7.333333333333333, + "eval_loss": 1.9152519702911377, + "eval_runtime": 43.8396, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 550 + }, + { + "epoch": 7.466666666666667, + "grad_norm": 3.480727195739746, + "learning_rate": 1.362962962962963e-05, + "loss": 0.6614, + "step": 560 + }, + { + "epoch": 7.466666666666667, + "eval_loss": 1.9116826057434082, + "eval_runtime": 43.8448, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 560 + }, + { + "epoch": 7.6, + "grad_norm": 3.9647531509399414, + "learning_rate": 1.2444444444444446e-05, + "loss": 0.8231, + "step": 570 + }, + { + "epoch": 7.6, + "eval_loss": 1.9044451713562012, + "eval_runtime": 43.8427, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 570 + }, + { + "epoch": 7.733333333333333, + "grad_norm": 3.6963553428649902, + "learning_rate": 1.125925925925926e-05, + "loss": 0.7385, + "step": 580 + }, + { + "epoch": 7.733333333333333, + "eval_loss": 1.9102691411972046, + "eval_runtime": 43.8393, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 580 + }, + { + "epoch": 7.866666666666667, + "grad_norm": 2.862262725830078, + "learning_rate": 1.0074074074074074e-05, + "loss": 0.7405, + "step": 590 + }, + { + "epoch": 7.866666666666667, + "eval_loss": 1.9190014600753784, + "eval_runtime": 43.8405, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 590 + }, + { + "epoch": 8.0, + "grad_norm": 3.0339815616607666, + "learning_rate": 8.888888888888888e-06, + "loss": 0.7132, + "step": 600 + }, + { + "epoch": 8.0, + "eval_loss": 1.9125250577926636, + "eval_runtime": 43.8318, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 600 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 9.8316431917056e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..b640cb87d42d73ef3e666695734b9a4a5d2e726b --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:23e053da7cff1474237bf96cdfc5f1d686f0687dbe265898f2ae508ab89bc04f +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..78c803d8afb7aaae078cdd14c00d8b62fd730564 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5d900ae673147b577280a50950cf39138de6592abd3c415878d09a5e6d09c227 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..6a970899a5edc16268fdea83560e0495a3d06810 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fa5b53289977451ca52671d3897055616936322daf22f6e4246ff72a467aef1c +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..9ae36545a6cbe48ac387e9a4edd1288050b062ff --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b0de73605756aa391aaa9ea36adcbd12bd865860a2561b0aaca0c704b25cfe02 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..aa6575338096b980c22f5b212d6e263f3383eb96 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/trainer_state.json @@ -0,0 +1,948 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 8.133333333333333, + "eval_steps": 10, + "global_step": 610, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.1980743408203125, + "learning_rate": 3.614814814814815e-05, + "loss": 0.9424, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.7190055847167969, + "eval_runtime": 43.8492, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 1.9299914836883545, + "learning_rate": 3.4962962962962965e-05, + "loss": 0.963, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 1.7638393640518188, + "eval_runtime": 43.8416, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.467031478881836, + "learning_rate": 3.377777777777778e-05, + "loss": 0.8679, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 1.8204401731491089, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.7744836807250977, + "learning_rate": 3.259259259259259e-05, + "loss": 0.9249, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 1.7969322204589844, + "eval_runtime": 43.8499, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 2.61295223236084, + "learning_rate": 3.140740740740741e-05, + "loss": 0.8338, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 1.800340175628662, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.6296002864837646, + "learning_rate": 3.0222222222222225e-05, + "loss": 0.839, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 1.7916167974472046, + "eval_runtime": 43.8469, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.1177334785461426, + "learning_rate": 2.9037037037037042e-05, + "loss": 0.8077, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 1.800661325454712, + "eval_runtime": 43.8551, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 2.4448559284210205, + "learning_rate": 2.7851851851851856e-05, + "loss": 0.8461, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 1.8001384735107422, + "eval_runtime": 43.8564, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 2.592481851577759, + "learning_rate": 2.6666666666666667e-05, + "loss": 0.883, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 1.8040210008621216, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 2.9840338230133057, + "learning_rate": 2.5481481481481484e-05, + "loss": 0.7618, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 1.8591012954711914, + "eval_runtime": 43.8587, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.0177509784698486, + "learning_rate": 2.4296296296296298e-05, + "loss": 0.8613, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 1.8731619119644165, + "eval_runtime": 43.8558, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.9949898719787598, + "learning_rate": 2.3111111111111112e-05, + "loss": 0.716, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 1.8551521301269531, + "eval_runtime": 43.853, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 4.913350582122803, + "learning_rate": 2.192592592592593e-05, + "loss": 0.7622, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 1.8626867532730103, + "eval_runtime": 43.855, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 2.100435495376587, + "learning_rate": 2.074074074074074e-05, + "loss": 0.8401, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 1.8561443090438843, + "eval_runtime": 43.8617, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.0799033641815186, + "learning_rate": 1.9555555555555557e-05, + "loss": 0.7604, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 1.8695520162582397, + "eval_runtime": 43.8581, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 2.419006109237671, + "learning_rate": 1.837037037037037e-05, + "loss": 0.7594, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 1.858919382095337, + "eval_runtime": 43.8489, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 520 + }, + { + "epoch": 7.066666666666666, + "grad_norm": 2.5701613426208496, + "learning_rate": 1.7185185185185185e-05, + "loss": 0.8374, + "step": 530 + }, + { + "epoch": 7.066666666666666, + "eval_loss": 1.8769150972366333, + "eval_runtime": 43.8618, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 530 + }, + { + "epoch": 7.2, + "grad_norm": 5.503419876098633, + "learning_rate": 1.6000000000000003e-05, + "loss": 0.7094, + "step": 540 + }, + { + "epoch": 7.2, + "eval_loss": 1.9237473011016846, + "eval_runtime": 43.8515, + "eval_samples_per_second": 22.804, + "eval_steps_per_second": 2.851, + "step": 540 + }, + { + "epoch": 7.333333333333333, + "grad_norm": 2.483323574066162, + "learning_rate": 1.4814814814814815e-05, + "loss": 0.692, + "step": 550 + }, + { + "epoch": 7.333333333333333, + "eval_loss": 1.9152519702911377, + "eval_runtime": 43.8396, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 550 + }, + { + "epoch": 7.466666666666667, + "grad_norm": 3.480727195739746, + "learning_rate": 1.362962962962963e-05, + "loss": 0.6614, + "step": 560 + }, + { + "epoch": 7.466666666666667, + "eval_loss": 1.9116826057434082, + "eval_runtime": 43.8448, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 560 + }, + { + "epoch": 7.6, + "grad_norm": 3.9647531509399414, + "learning_rate": 1.2444444444444446e-05, + "loss": 0.8231, + "step": 570 + }, + { + "epoch": 7.6, + "eval_loss": 1.9044451713562012, + "eval_runtime": 43.8427, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 570 + }, + { + "epoch": 7.733333333333333, + "grad_norm": 3.6963553428649902, + "learning_rate": 1.125925925925926e-05, + "loss": 0.7385, + "step": 580 + }, + { + "epoch": 7.733333333333333, + "eval_loss": 1.9102691411972046, + "eval_runtime": 43.8393, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 580 + }, + { + "epoch": 7.866666666666667, + "grad_norm": 2.862262725830078, + "learning_rate": 1.0074074074074074e-05, + "loss": 0.7405, + "step": 590 + }, + { + "epoch": 7.866666666666667, + "eval_loss": 1.9190014600753784, + "eval_runtime": 43.8405, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 590 + }, + { + "epoch": 8.0, + "grad_norm": 3.0339815616607666, + "learning_rate": 8.888888888888888e-06, + "loss": 0.7132, + "step": 600 + }, + { + "epoch": 8.0, + "eval_loss": 1.9125250577926636, + "eval_runtime": 43.8318, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 600 + }, + { + "epoch": 8.133333333333333, + "grad_norm": 2.858896017074585, + "learning_rate": 7.703703703703704e-06, + "loss": 0.6342, + "step": 610 + }, + { + "epoch": 8.133333333333333, + "eval_loss": 1.9326601028442383, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 610 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 9.99550391156736e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..fc31da4a67446555fdf9396254b6a94fac4ac869 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:37aea42358a7e90088ed6e35af16ee0d090598178b142766adcfe6311263557e +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..824d9e9256bcf6e272a66f1cd3b043a223b36d19 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:43cb8f6c7a3c2ff6d4f95ad8bb4189e198df381d1f90dcc90caf61badfb17845 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..da7e5f0f7045f8fad1c1529974e555cc67b8f5f0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f2b2ce429e00eba0165cdfd527b7ca384fed68ae5660561d0cbc6dbdd51ce7f1 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..8f09b0521c27c995f0878cde37cf7b4138abd8e6 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0cecac504a0d6e20c848bc43265028cb51bdbaee46716ad0736302cdd3a2376c +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..954b01eb7195c9f9d943c07922ce0259c8453890 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/trainer_state.json @@ -0,0 +1,963 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 8.266666666666667, + "eval_steps": 10, + "global_step": 620, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.1980743408203125, + "learning_rate": 3.614814814814815e-05, + "loss": 0.9424, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.7190055847167969, + "eval_runtime": 43.8492, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 1.9299914836883545, + "learning_rate": 3.4962962962962965e-05, + "loss": 0.963, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 1.7638393640518188, + "eval_runtime": 43.8416, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.467031478881836, + "learning_rate": 3.377777777777778e-05, + "loss": 0.8679, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 1.8204401731491089, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.7744836807250977, + "learning_rate": 3.259259259259259e-05, + "loss": 0.9249, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 1.7969322204589844, + "eval_runtime": 43.8499, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 2.61295223236084, + "learning_rate": 3.140740740740741e-05, + "loss": 0.8338, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 1.800340175628662, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.6296002864837646, + "learning_rate": 3.0222222222222225e-05, + "loss": 0.839, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 1.7916167974472046, + "eval_runtime": 43.8469, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.1177334785461426, + "learning_rate": 2.9037037037037042e-05, + "loss": 0.8077, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 1.800661325454712, + "eval_runtime": 43.8551, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 2.4448559284210205, + "learning_rate": 2.7851851851851856e-05, + "loss": 0.8461, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 1.8001384735107422, + "eval_runtime": 43.8564, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 2.592481851577759, + "learning_rate": 2.6666666666666667e-05, + "loss": 0.883, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 1.8040210008621216, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 2.9840338230133057, + "learning_rate": 2.5481481481481484e-05, + "loss": 0.7618, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 1.8591012954711914, + "eval_runtime": 43.8587, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.0177509784698486, + "learning_rate": 2.4296296296296298e-05, + "loss": 0.8613, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 1.8731619119644165, + "eval_runtime": 43.8558, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.9949898719787598, + "learning_rate": 2.3111111111111112e-05, + "loss": 0.716, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 1.8551521301269531, + "eval_runtime": 43.853, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 4.913350582122803, + "learning_rate": 2.192592592592593e-05, + "loss": 0.7622, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 1.8626867532730103, + "eval_runtime": 43.855, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 2.100435495376587, + "learning_rate": 2.074074074074074e-05, + "loss": 0.8401, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 1.8561443090438843, + "eval_runtime": 43.8617, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.0799033641815186, + "learning_rate": 1.9555555555555557e-05, + "loss": 0.7604, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 1.8695520162582397, + "eval_runtime": 43.8581, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 2.419006109237671, + "learning_rate": 1.837037037037037e-05, + "loss": 0.7594, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 1.858919382095337, + "eval_runtime": 43.8489, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 520 + }, + { + "epoch": 7.066666666666666, + "grad_norm": 2.5701613426208496, + "learning_rate": 1.7185185185185185e-05, + "loss": 0.8374, + "step": 530 + }, + { + "epoch": 7.066666666666666, + "eval_loss": 1.8769150972366333, + "eval_runtime": 43.8618, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 530 + }, + { + "epoch": 7.2, + "grad_norm": 5.503419876098633, + "learning_rate": 1.6000000000000003e-05, + "loss": 0.7094, + "step": 540 + }, + { + "epoch": 7.2, + "eval_loss": 1.9237473011016846, + "eval_runtime": 43.8515, + "eval_samples_per_second": 22.804, + "eval_steps_per_second": 2.851, + "step": 540 + }, + { + "epoch": 7.333333333333333, + "grad_norm": 2.483323574066162, + "learning_rate": 1.4814814814814815e-05, + "loss": 0.692, + "step": 550 + }, + { + "epoch": 7.333333333333333, + "eval_loss": 1.9152519702911377, + "eval_runtime": 43.8396, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 550 + }, + { + "epoch": 7.466666666666667, + "grad_norm": 3.480727195739746, + "learning_rate": 1.362962962962963e-05, + "loss": 0.6614, + "step": 560 + }, + { + "epoch": 7.466666666666667, + "eval_loss": 1.9116826057434082, + "eval_runtime": 43.8448, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 560 + }, + { + "epoch": 7.6, + "grad_norm": 3.9647531509399414, + "learning_rate": 1.2444444444444446e-05, + "loss": 0.8231, + "step": 570 + }, + { + "epoch": 7.6, + "eval_loss": 1.9044451713562012, + "eval_runtime": 43.8427, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 570 + }, + { + "epoch": 7.733333333333333, + "grad_norm": 3.6963553428649902, + "learning_rate": 1.125925925925926e-05, + "loss": 0.7385, + "step": 580 + }, + { + "epoch": 7.733333333333333, + "eval_loss": 1.9102691411972046, + "eval_runtime": 43.8393, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 580 + }, + { + "epoch": 7.866666666666667, + "grad_norm": 2.862262725830078, + "learning_rate": 1.0074074074074074e-05, + "loss": 0.7405, + "step": 590 + }, + { + "epoch": 7.866666666666667, + "eval_loss": 1.9190014600753784, + "eval_runtime": 43.8405, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 590 + }, + { + "epoch": 8.0, + "grad_norm": 3.0339815616607666, + "learning_rate": 8.888888888888888e-06, + "loss": 0.7132, + "step": 600 + }, + { + "epoch": 8.0, + "eval_loss": 1.9125250577926636, + "eval_runtime": 43.8318, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 600 + }, + { + "epoch": 8.133333333333333, + "grad_norm": 2.858896017074585, + "learning_rate": 7.703703703703704e-06, + "loss": 0.6342, + "step": 610 + }, + { + "epoch": 8.133333333333333, + "eval_loss": 1.9326601028442383, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 610 + }, + { + "epoch": 8.266666666666667, + "grad_norm": 3.1907079219818115, + "learning_rate": 6.51851851851852e-06, + "loss": 0.725, + "step": 620 + }, + { + "epoch": 8.266666666666667, + "eval_loss": 1.9524545669555664, + "eval_runtime": 43.8273, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 620 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 1.015936463142912e+17, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..2922c312c0f9d06d080ba8cc142d89956c1cac07 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a9b7389944ec5665f7acd028da0befedf34936da09be7e17f2fb291ccc7c79d8 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..7bbd3ae2ae85e746452b911b0d2e4d25da9c4689 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ebbbed8e4d44e3190399479e4e1d65338247aac94bb3993b6596ae130cecbec4 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..96d7a3f6be074e46014211fae837a521e5c5140c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dd6c4f62bed5401eddcf930d960632a48c624bea715ca64cedd7d04db198b4a0 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..31b434c7d46bacc0a45ac73d9e6264e373e131cd --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9d8d4a20c36091528ac87a7edc5845454d614d78ad71a59c7a4ae563b2fe291f +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..cd16c9ce745f92028de86c3ad1d5d6efab3ff11a --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/trainer_state.json @@ -0,0 +1,978 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 8.4, + "eval_steps": 10, + "global_step": 630, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.1980743408203125, + "learning_rate": 3.614814814814815e-05, + "loss": 0.9424, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.7190055847167969, + "eval_runtime": 43.8492, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 1.9299914836883545, + "learning_rate": 3.4962962962962965e-05, + "loss": 0.963, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 1.7638393640518188, + "eval_runtime": 43.8416, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.467031478881836, + "learning_rate": 3.377777777777778e-05, + "loss": 0.8679, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 1.8204401731491089, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.7744836807250977, + "learning_rate": 3.259259259259259e-05, + "loss": 0.9249, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 1.7969322204589844, + "eval_runtime": 43.8499, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 2.61295223236084, + "learning_rate": 3.140740740740741e-05, + "loss": 0.8338, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 1.800340175628662, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.6296002864837646, + "learning_rate": 3.0222222222222225e-05, + "loss": 0.839, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 1.7916167974472046, + "eval_runtime": 43.8469, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.1177334785461426, + "learning_rate": 2.9037037037037042e-05, + "loss": 0.8077, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 1.800661325454712, + "eval_runtime": 43.8551, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 2.4448559284210205, + "learning_rate": 2.7851851851851856e-05, + "loss": 0.8461, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 1.8001384735107422, + "eval_runtime": 43.8564, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 2.592481851577759, + "learning_rate": 2.6666666666666667e-05, + "loss": 0.883, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 1.8040210008621216, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 2.9840338230133057, + "learning_rate": 2.5481481481481484e-05, + "loss": 0.7618, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 1.8591012954711914, + "eval_runtime": 43.8587, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.0177509784698486, + "learning_rate": 2.4296296296296298e-05, + "loss": 0.8613, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 1.8731619119644165, + "eval_runtime": 43.8558, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.9949898719787598, + "learning_rate": 2.3111111111111112e-05, + "loss": 0.716, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 1.8551521301269531, + "eval_runtime": 43.853, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 4.913350582122803, + "learning_rate": 2.192592592592593e-05, + "loss": 0.7622, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 1.8626867532730103, + "eval_runtime": 43.855, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 2.100435495376587, + "learning_rate": 2.074074074074074e-05, + "loss": 0.8401, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 1.8561443090438843, + "eval_runtime": 43.8617, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.0799033641815186, + "learning_rate": 1.9555555555555557e-05, + "loss": 0.7604, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 1.8695520162582397, + "eval_runtime": 43.8581, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 2.419006109237671, + "learning_rate": 1.837037037037037e-05, + "loss": 0.7594, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 1.858919382095337, + "eval_runtime": 43.8489, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 520 + }, + { + "epoch": 7.066666666666666, + "grad_norm": 2.5701613426208496, + "learning_rate": 1.7185185185185185e-05, + "loss": 0.8374, + "step": 530 + }, + { + "epoch": 7.066666666666666, + "eval_loss": 1.8769150972366333, + "eval_runtime": 43.8618, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 530 + }, + { + "epoch": 7.2, + "grad_norm": 5.503419876098633, + "learning_rate": 1.6000000000000003e-05, + "loss": 0.7094, + "step": 540 + }, + { + "epoch": 7.2, + "eval_loss": 1.9237473011016846, + "eval_runtime": 43.8515, + "eval_samples_per_second": 22.804, + "eval_steps_per_second": 2.851, + "step": 540 + }, + { + "epoch": 7.333333333333333, + "grad_norm": 2.483323574066162, + "learning_rate": 1.4814814814814815e-05, + "loss": 0.692, + "step": 550 + }, + { + "epoch": 7.333333333333333, + "eval_loss": 1.9152519702911377, + "eval_runtime": 43.8396, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 550 + }, + { + "epoch": 7.466666666666667, + "grad_norm": 3.480727195739746, + "learning_rate": 1.362962962962963e-05, + "loss": 0.6614, + "step": 560 + }, + { + "epoch": 7.466666666666667, + "eval_loss": 1.9116826057434082, + "eval_runtime": 43.8448, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 560 + }, + { + "epoch": 7.6, + "grad_norm": 3.9647531509399414, + "learning_rate": 1.2444444444444446e-05, + "loss": 0.8231, + "step": 570 + }, + { + "epoch": 7.6, + "eval_loss": 1.9044451713562012, + "eval_runtime": 43.8427, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 570 + }, + { + "epoch": 7.733333333333333, + "grad_norm": 3.6963553428649902, + "learning_rate": 1.125925925925926e-05, + "loss": 0.7385, + "step": 580 + }, + { + "epoch": 7.733333333333333, + "eval_loss": 1.9102691411972046, + "eval_runtime": 43.8393, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 580 + }, + { + "epoch": 7.866666666666667, + "grad_norm": 2.862262725830078, + "learning_rate": 1.0074074074074074e-05, + "loss": 0.7405, + "step": 590 + }, + { + "epoch": 7.866666666666667, + "eval_loss": 1.9190014600753784, + "eval_runtime": 43.8405, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 590 + }, + { + "epoch": 8.0, + "grad_norm": 3.0339815616607666, + "learning_rate": 8.888888888888888e-06, + "loss": 0.7132, + "step": 600 + }, + { + "epoch": 8.0, + "eval_loss": 1.9125250577926636, + "eval_runtime": 43.8318, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 600 + }, + { + "epoch": 8.133333333333333, + "grad_norm": 2.858896017074585, + "learning_rate": 7.703703703703704e-06, + "loss": 0.6342, + "step": 610 + }, + { + "epoch": 8.133333333333333, + "eval_loss": 1.9326601028442383, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 610 + }, + { + "epoch": 8.266666666666667, + "grad_norm": 3.1907079219818115, + "learning_rate": 6.51851851851852e-06, + "loss": 0.725, + "step": 620 + }, + { + "epoch": 8.266666666666667, + "eval_loss": 1.9524545669555664, + "eval_runtime": 43.8273, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 620 + }, + { + "epoch": 8.4, + "grad_norm": 2.529897928237915, + "learning_rate": 5.333333333333334e-06, + "loss": 0.7053, + "step": 630 + }, + { + "epoch": 8.4, + "eval_loss": 1.9474350214004517, + "eval_runtime": 43.8504, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 630 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 1.032322535129088e+17, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..ef9070d50c97bf3dc4eecf4260c3f8e026b21c65 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:896587d37fd57393163aa62a3d10665df35d0b39417f5d88394d4c652f385b23 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..f390db929d937ef4d325692648fee91de15ef38a --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dd3ba4f24bdafe169dd72bb960bc7f6dc1d4189495210fac31b63e8a6fa80b11 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..bc02fa7e506af341c87e94bd62a6cbdfbd057096 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0597f3b9ac321e002676eb1712670348770197d9b197cdd7a7e16f465315444e +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..cc14f7545324288e67e156d036369e2cebdcf74f --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dd2d3d121570090627f59257118b55358f83f1b060f0fb11ab062387addadff4 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..0e158cb00cd7fa5f50fd6d377074ced9020b18b1 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/trainer_state.json @@ -0,0 +1,993 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 8.533333333333333, + "eval_steps": 10, + "global_step": 640, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.1980743408203125, + "learning_rate": 3.614814814814815e-05, + "loss": 0.9424, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.7190055847167969, + "eval_runtime": 43.8492, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 1.9299914836883545, + "learning_rate": 3.4962962962962965e-05, + "loss": 0.963, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 1.7638393640518188, + "eval_runtime": 43.8416, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.467031478881836, + "learning_rate": 3.377777777777778e-05, + "loss": 0.8679, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 1.8204401731491089, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.7744836807250977, + "learning_rate": 3.259259259259259e-05, + "loss": 0.9249, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 1.7969322204589844, + "eval_runtime": 43.8499, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 2.61295223236084, + "learning_rate": 3.140740740740741e-05, + "loss": 0.8338, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 1.800340175628662, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.6296002864837646, + "learning_rate": 3.0222222222222225e-05, + "loss": 0.839, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 1.7916167974472046, + "eval_runtime": 43.8469, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.1177334785461426, + "learning_rate": 2.9037037037037042e-05, + "loss": 0.8077, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 1.800661325454712, + "eval_runtime": 43.8551, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 2.4448559284210205, + "learning_rate": 2.7851851851851856e-05, + "loss": 0.8461, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 1.8001384735107422, + "eval_runtime": 43.8564, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 2.592481851577759, + "learning_rate": 2.6666666666666667e-05, + "loss": 0.883, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 1.8040210008621216, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 2.9840338230133057, + "learning_rate": 2.5481481481481484e-05, + "loss": 0.7618, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 1.8591012954711914, + "eval_runtime": 43.8587, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.0177509784698486, + "learning_rate": 2.4296296296296298e-05, + "loss": 0.8613, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 1.8731619119644165, + "eval_runtime": 43.8558, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.9949898719787598, + "learning_rate": 2.3111111111111112e-05, + "loss": 0.716, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 1.8551521301269531, + "eval_runtime": 43.853, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 4.913350582122803, + "learning_rate": 2.192592592592593e-05, + "loss": 0.7622, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 1.8626867532730103, + "eval_runtime": 43.855, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 2.100435495376587, + "learning_rate": 2.074074074074074e-05, + "loss": 0.8401, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 1.8561443090438843, + "eval_runtime": 43.8617, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.0799033641815186, + "learning_rate": 1.9555555555555557e-05, + "loss": 0.7604, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 1.8695520162582397, + "eval_runtime": 43.8581, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 2.419006109237671, + "learning_rate": 1.837037037037037e-05, + "loss": 0.7594, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 1.858919382095337, + "eval_runtime": 43.8489, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 520 + }, + { + "epoch": 7.066666666666666, + "grad_norm": 2.5701613426208496, + "learning_rate": 1.7185185185185185e-05, + "loss": 0.8374, + "step": 530 + }, + { + "epoch": 7.066666666666666, + "eval_loss": 1.8769150972366333, + "eval_runtime": 43.8618, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 530 + }, + { + "epoch": 7.2, + "grad_norm": 5.503419876098633, + "learning_rate": 1.6000000000000003e-05, + "loss": 0.7094, + "step": 540 + }, + { + "epoch": 7.2, + "eval_loss": 1.9237473011016846, + "eval_runtime": 43.8515, + "eval_samples_per_second": 22.804, + "eval_steps_per_second": 2.851, + "step": 540 + }, + { + "epoch": 7.333333333333333, + "grad_norm": 2.483323574066162, + "learning_rate": 1.4814814814814815e-05, + "loss": 0.692, + "step": 550 + }, + { + "epoch": 7.333333333333333, + "eval_loss": 1.9152519702911377, + "eval_runtime": 43.8396, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 550 + }, + { + "epoch": 7.466666666666667, + "grad_norm": 3.480727195739746, + "learning_rate": 1.362962962962963e-05, + "loss": 0.6614, + "step": 560 + }, + { + "epoch": 7.466666666666667, + "eval_loss": 1.9116826057434082, + "eval_runtime": 43.8448, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 560 + }, + { + "epoch": 7.6, + "grad_norm": 3.9647531509399414, + "learning_rate": 1.2444444444444446e-05, + "loss": 0.8231, + "step": 570 + }, + { + "epoch": 7.6, + "eval_loss": 1.9044451713562012, + "eval_runtime": 43.8427, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 570 + }, + { + "epoch": 7.733333333333333, + "grad_norm": 3.6963553428649902, + "learning_rate": 1.125925925925926e-05, + "loss": 0.7385, + "step": 580 + }, + { + "epoch": 7.733333333333333, + "eval_loss": 1.9102691411972046, + "eval_runtime": 43.8393, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 580 + }, + { + "epoch": 7.866666666666667, + "grad_norm": 2.862262725830078, + "learning_rate": 1.0074074074074074e-05, + "loss": 0.7405, + "step": 590 + }, + { + "epoch": 7.866666666666667, + "eval_loss": 1.9190014600753784, + "eval_runtime": 43.8405, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 590 + }, + { + "epoch": 8.0, + "grad_norm": 3.0339815616607666, + "learning_rate": 8.888888888888888e-06, + "loss": 0.7132, + "step": 600 + }, + { + "epoch": 8.0, + "eval_loss": 1.9125250577926636, + "eval_runtime": 43.8318, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 600 + }, + { + "epoch": 8.133333333333333, + "grad_norm": 2.858896017074585, + "learning_rate": 7.703703703703704e-06, + "loss": 0.6342, + "step": 610 + }, + { + "epoch": 8.133333333333333, + "eval_loss": 1.9326601028442383, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 610 + }, + { + "epoch": 8.266666666666667, + "grad_norm": 3.1907079219818115, + "learning_rate": 6.51851851851852e-06, + "loss": 0.725, + "step": 620 + }, + { + "epoch": 8.266666666666667, + "eval_loss": 1.9524545669555664, + "eval_runtime": 43.8273, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 620 + }, + { + "epoch": 8.4, + "grad_norm": 2.529897928237915, + "learning_rate": 5.333333333333334e-06, + "loss": 0.7053, + "step": 630 + }, + { + "epoch": 8.4, + "eval_loss": 1.9474350214004517, + "eval_runtime": 43.8504, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 630 + }, + { + "epoch": 8.533333333333333, + "grad_norm": 3.163334608078003, + "learning_rate": 4.1481481481481485e-06, + "loss": 0.6571, + "step": 640 + }, + { + "epoch": 8.533333333333333, + "eval_loss": 1.9447112083435059, + "eval_runtime": 43.8445, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 640 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 1.048708607115264e+17, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..865375361c2c1ad64bb66e9b2d234a6a81bc7146 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:edb565ded538f7d4beea71039d37da309c61669df8cff636bb1846243fe4cf20 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..28c34b1d72b6fe625d0c659e59e10acf1296ef28 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f3a67b8118eb3f0beed3b282f3f7341305a5e05d58325693a247b56f1f41d874 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..4d763156eb3a586b51733d4ec683a815a6ae5fab --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e66e316bd2615a5005aac13970f8b8e71830843ea716191e53ff7dc38997af08 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..75f881b3ec9ba86b1878709fb0af361a6f712546 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:31b02e0b7ebaaab7bf8f183e3b47970500df166e496df9fdd39405913db43e64 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..68326726b2ae3baf57fe43443fe380efe288ed24 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/trainer_state.json @@ -0,0 +1,1008 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 8.666666666666666, + "eval_steps": 10, + "global_step": 650, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.1980743408203125, + "learning_rate": 3.614814814814815e-05, + "loss": 0.9424, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.7190055847167969, + "eval_runtime": 43.8492, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 1.9299914836883545, + "learning_rate": 3.4962962962962965e-05, + "loss": 0.963, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 1.7638393640518188, + "eval_runtime": 43.8416, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.467031478881836, + "learning_rate": 3.377777777777778e-05, + "loss": 0.8679, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 1.8204401731491089, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.7744836807250977, + "learning_rate": 3.259259259259259e-05, + "loss": 0.9249, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 1.7969322204589844, + "eval_runtime": 43.8499, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 2.61295223236084, + "learning_rate": 3.140740740740741e-05, + "loss": 0.8338, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 1.800340175628662, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.6296002864837646, + "learning_rate": 3.0222222222222225e-05, + "loss": 0.839, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 1.7916167974472046, + "eval_runtime": 43.8469, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.1177334785461426, + "learning_rate": 2.9037037037037042e-05, + "loss": 0.8077, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 1.800661325454712, + "eval_runtime": 43.8551, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 2.4448559284210205, + "learning_rate": 2.7851851851851856e-05, + "loss": 0.8461, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 1.8001384735107422, + "eval_runtime": 43.8564, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 2.592481851577759, + "learning_rate": 2.6666666666666667e-05, + "loss": 0.883, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 1.8040210008621216, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 2.9840338230133057, + "learning_rate": 2.5481481481481484e-05, + "loss": 0.7618, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 1.8591012954711914, + "eval_runtime": 43.8587, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.0177509784698486, + "learning_rate": 2.4296296296296298e-05, + "loss": 0.8613, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 1.8731619119644165, + "eval_runtime": 43.8558, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.9949898719787598, + "learning_rate": 2.3111111111111112e-05, + "loss": 0.716, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 1.8551521301269531, + "eval_runtime": 43.853, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 4.913350582122803, + "learning_rate": 2.192592592592593e-05, + "loss": 0.7622, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 1.8626867532730103, + "eval_runtime": 43.855, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 2.100435495376587, + "learning_rate": 2.074074074074074e-05, + "loss": 0.8401, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 1.8561443090438843, + "eval_runtime": 43.8617, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.0799033641815186, + "learning_rate": 1.9555555555555557e-05, + "loss": 0.7604, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 1.8695520162582397, + "eval_runtime": 43.8581, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 2.419006109237671, + "learning_rate": 1.837037037037037e-05, + "loss": 0.7594, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 1.858919382095337, + "eval_runtime": 43.8489, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 520 + }, + { + "epoch": 7.066666666666666, + "grad_norm": 2.5701613426208496, + "learning_rate": 1.7185185185185185e-05, + "loss": 0.8374, + "step": 530 + }, + { + "epoch": 7.066666666666666, + "eval_loss": 1.8769150972366333, + "eval_runtime": 43.8618, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 530 + }, + { + "epoch": 7.2, + "grad_norm": 5.503419876098633, + "learning_rate": 1.6000000000000003e-05, + "loss": 0.7094, + "step": 540 + }, + { + "epoch": 7.2, + "eval_loss": 1.9237473011016846, + "eval_runtime": 43.8515, + "eval_samples_per_second": 22.804, + "eval_steps_per_second": 2.851, + "step": 540 + }, + { + "epoch": 7.333333333333333, + "grad_norm": 2.483323574066162, + "learning_rate": 1.4814814814814815e-05, + "loss": 0.692, + "step": 550 + }, + { + "epoch": 7.333333333333333, + "eval_loss": 1.9152519702911377, + "eval_runtime": 43.8396, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 550 + }, + { + "epoch": 7.466666666666667, + "grad_norm": 3.480727195739746, + "learning_rate": 1.362962962962963e-05, + "loss": 0.6614, + "step": 560 + }, + { + "epoch": 7.466666666666667, + "eval_loss": 1.9116826057434082, + "eval_runtime": 43.8448, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 560 + }, + { + "epoch": 7.6, + "grad_norm": 3.9647531509399414, + "learning_rate": 1.2444444444444446e-05, + "loss": 0.8231, + "step": 570 + }, + { + "epoch": 7.6, + "eval_loss": 1.9044451713562012, + "eval_runtime": 43.8427, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 570 + }, + { + "epoch": 7.733333333333333, + "grad_norm": 3.6963553428649902, + "learning_rate": 1.125925925925926e-05, + "loss": 0.7385, + "step": 580 + }, + { + "epoch": 7.733333333333333, + "eval_loss": 1.9102691411972046, + "eval_runtime": 43.8393, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 580 + }, + { + "epoch": 7.866666666666667, + "grad_norm": 2.862262725830078, + "learning_rate": 1.0074074074074074e-05, + "loss": 0.7405, + "step": 590 + }, + { + "epoch": 7.866666666666667, + "eval_loss": 1.9190014600753784, + "eval_runtime": 43.8405, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 590 + }, + { + "epoch": 8.0, + "grad_norm": 3.0339815616607666, + "learning_rate": 8.888888888888888e-06, + "loss": 0.7132, + "step": 600 + }, + { + "epoch": 8.0, + "eval_loss": 1.9125250577926636, + "eval_runtime": 43.8318, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 600 + }, + { + "epoch": 8.133333333333333, + "grad_norm": 2.858896017074585, + "learning_rate": 7.703703703703704e-06, + "loss": 0.6342, + "step": 610 + }, + { + "epoch": 8.133333333333333, + "eval_loss": 1.9326601028442383, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 610 + }, + { + "epoch": 8.266666666666667, + "grad_norm": 3.1907079219818115, + "learning_rate": 6.51851851851852e-06, + "loss": 0.725, + "step": 620 + }, + { + "epoch": 8.266666666666667, + "eval_loss": 1.9524545669555664, + "eval_runtime": 43.8273, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 620 + }, + { + "epoch": 8.4, + "grad_norm": 2.529897928237915, + "learning_rate": 5.333333333333334e-06, + "loss": 0.7053, + "step": 630 + }, + { + "epoch": 8.4, + "eval_loss": 1.9474350214004517, + "eval_runtime": 43.8504, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 630 + }, + { + "epoch": 8.533333333333333, + "grad_norm": 3.163334608078003, + "learning_rate": 4.1481481481481485e-06, + "loss": 0.6571, + "step": 640 + }, + { + "epoch": 8.533333333333333, + "eval_loss": 1.9447112083435059, + "eval_runtime": 43.8445, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 640 + }, + { + "epoch": 8.666666666666666, + "grad_norm": 2.8849895000457764, + "learning_rate": 2.962962962962963e-06, + "loss": 0.6508, + "step": 650 + }, + { + "epoch": 8.666666666666666, + "eval_loss": 1.9450982809066772, + "eval_runtime": 43.846, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 650 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 1.06509467910144e+17, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..60ef59e0906c1bf5390043ecfad95a51eb754319 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d4e353e2b66286524f98112239e30f825a07024cdf97eaad1a8e396c3abb7020 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..ca8f37dcf932f856d2754ea6c78172afd3c4b6f8 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3e2c93210414ee457f42f03a9ed1432c36369eaae7a8bc4f3f421d89013ed710 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..1bd0e24dcfea6867dcdb66e0b90f3344dbd9d339 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:66fa7ea9452d536e82e5c18c4a0a05615143763aa569d9af13553a06a11128de +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..403f6f78ce81468eb12e4e1c093d8452c7d5a14e --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e9e3b3eb476269cb66006445e45fa57a95b9d6fbb9998ae81b82199f9b98541e +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..1a1cab2e63fcbf8e07621202b719164f9d60efb9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/trainer_state.json @@ -0,0 +1,1023 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 8.8, + "eval_steps": 10, + "global_step": 660, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.1980743408203125, + "learning_rate": 3.614814814814815e-05, + "loss": 0.9424, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.7190055847167969, + "eval_runtime": 43.8492, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 1.9299914836883545, + "learning_rate": 3.4962962962962965e-05, + "loss": 0.963, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 1.7638393640518188, + "eval_runtime": 43.8416, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.467031478881836, + "learning_rate": 3.377777777777778e-05, + "loss": 0.8679, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 1.8204401731491089, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.7744836807250977, + "learning_rate": 3.259259259259259e-05, + "loss": 0.9249, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 1.7969322204589844, + "eval_runtime": 43.8499, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 2.61295223236084, + "learning_rate": 3.140740740740741e-05, + "loss": 0.8338, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 1.800340175628662, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.6296002864837646, + "learning_rate": 3.0222222222222225e-05, + "loss": 0.839, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 1.7916167974472046, + "eval_runtime": 43.8469, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.1177334785461426, + "learning_rate": 2.9037037037037042e-05, + "loss": 0.8077, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 1.800661325454712, + "eval_runtime": 43.8551, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 2.4448559284210205, + "learning_rate": 2.7851851851851856e-05, + "loss": 0.8461, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 1.8001384735107422, + "eval_runtime": 43.8564, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 2.592481851577759, + "learning_rate": 2.6666666666666667e-05, + "loss": 0.883, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 1.8040210008621216, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 2.9840338230133057, + "learning_rate": 2.5481481481481484e-05, + "loss": 0.7618, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 1.8591012954711914, + "eval_runtime": 43.8587, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.0177509784698486, + "learning_rate": 2.4296296296296298e-05, + "loss": 0.8613, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 1.8731619119644165, + "eval_runtime": 43.8558, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.9949898719787598, + "learning_rate": 2.3111111111111112e-05, + "loss": 0.716, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 1.8551521301269531, + "eval_runtime": 43.853, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 4.913350582122803, + "learning_rate": 2.192592592592593e-05, + "loss": 0.7622, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 1.8626867532730103, + "eval_runtime": 43.855, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 2.100435495376587, + "learning_rate": 2.074074074074074e-05, + "loss": 0.8401, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 1.8561443090438843, + "eval_runtime": 43.8617, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.0799033641815186, + "learning_rate": 1.9555555555555557e-05, + "loss": 0.7604, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 1.8695520162582397, + "eval_runtime": 43.8581, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 2.419006109237671, + "learning_rate": 1.837037037037037e-05, + "loss": 0.7594, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 1.858919382095337, + "eval_runtime": 43.8489, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 520 + }, + { + "epoch": 7.066666666666666, + "grad_norm": 2.5701613426208496, + "learning_rate": 1.7185185185185185e-05, + "loss": 0.8374, + "step": 530 + }, + { + "epoch": 7.066666666666666, + "eval_loss": 1.8769150972366333, + "eval_runtime": 43.8618, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 530 + }, + { + "epoch": 7.2, + "grad_norm": 5.503419876098633, + "learning_rate": 1.6000000000000003e-05, + "loss": 0.7094, + "step": 540 + }, + { + "epoch": 7.2, + "eval_loss": 1.9237473011016846, + "eval_runtime": 43.8515, + "eval_samples_per_second": 22.804, + "eval_steps_per_second": 2.851, + "step": 540 + }, + { + "epoch": 7.333333333333333, + "grad_norm": 2.483323574066162, + "learning_rate": 1.4814814814814815e-05, + "loss": 0.692, + "step": 550 + }, + { + "epoch": 7.333333333333333, + "eval_loss": 1.9152519702911377, + "eval_runtime": 43.8396, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 550 + }, + { + "epoch": 7.466666666666667, + "grad_norm": 3.480727195739746, + "learning_rate": 1.362962962962963e-05, + "loss": 0.6614, + "step": 560 + }, + { + "epoch": 7.466666666666667, + "eval_loss": 1.9116826057434082, + "eval_runtime": 43.8448, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 560 + }, + { + "epoch": 7.6, + "grad_norm": 3.9647531509399414, + "learning_rate": 1.2444444444444446e-05, + "loss": 0.8231, + "step": 570 + }, + { + "epoch": 7.6, + "eval_loss": 1.9044451713562012, + "eval_runtime": 43.8427, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 570 + }, + { + "epoch": 7.733333333333333, + "grad_norm": 3.6963553428649902, + "learning_rate": 1.125925925925926e-05, + "loss": 0.7385, + "step": 580 + }, + { + "epoch": 7.733333333333333, + "eval_loss": 1.9102691411972046, + "eval_runtime": 43.8393, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 580 + }, + { + "epoch": 7.866666666666667, + "grad_norm": 2.862262725830078, + "learning_rate": 1.0074074074074074e-05, + "loss": 0.7405, + "step": 590 + }, + { + "epoch": 7.866666666666667, + "eval_loss": 1.9190014600753784, + "eval_runtime": 43.8405, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 590 + }, + { + "epoch": 8.0, + "grad_norm": 3.0339815616607666, + "learning_rate": 8.888888888888888e-06, + "loss": 0.7132, + "step": 600 + }, + { + "epoch": 8.0, + "eval_loss": 1.9125250577926636, + "eval_runtime": 43.8318, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 600 + }, + { + "epoch": 8.133333333333333, + "grad_norm": 2.858896017074585, + "learning_rate": 7.703703703703704e-06, + "loss": 0.6342, + "step": 610 + }, + { + "epoch": 8.133333333333333, + "eval_loss": 1.9326601028442383, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 610 + }, + { + "epoch": 8.266666666666667, + "grad_norm": 3.1907079219818115, + "learning_rate": 6.51851851851852e-06, + "loss": 0.725, + "step": 620 + }, + { + "epoch": 8.266666666666667, + "eval_loss": 1.9524545669555664, + "eval_runtime": 43.8273, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 620 + }, + { + "epoch": 8.4, + "grad_norm": 2.529897928237915, + "learning_rate": 5.333333333333334e-06, + "loss": 0.7053, + "step": 630 + }, + { + "epoch": 8.4, + "eval_loss": 1.9474350214004517, + "eval_runtime": 43.8504, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 630 + }, + { + "epoch": 8.533333333333333, + "grad_norm": 3.163334608078003, + "learning_rate": 4.1481481481481485e-06, + "loss": 0.6571, + "step": 640 + }, + { + "epoch": 8.533333333333333, + "eval_loss": 1.9447112083435059, + "eval_runtime": 43.8445, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 640 + }, + { + "epoch": 8.666666666666666, + "grad_norm": 2.8849895000457764, + "learning_rate": 2.962962962962963e-06, + "loss": 0.6508, + "step": 650 + }, + { + "epoch": 8.666666666666666, + "eval_loss": 1.9450982809066772, + "eval_runtime": 43.846, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 650 + }, + { + "epoch": 8.8, + "grad_norm": 4.1778717041015625, + "learning_rate": 1.777777777777778e-06, + "loss": 0.7209, + "step": 660 + }, + { + "epoch": 8.8, + "eval_loss": 1.9446152448654175, + "eval_runtime": 43.8418, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 660 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 1.081480751087616e+17, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..87b876fdda629555e5638a22e4a65e7359936f41 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:11405ab7f3d43e311d0535d532f862240585bf306ae29726ec7363f3a1ca70a3 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..eee64ee76513cbea309f0bf51643e487f330cbb9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7d52a65810bf1c7e6f473dab9d588e2e145a19b92bdd5771b35fe28a98d86c31 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..b50ed8357a00070f99a52843c3e3d150dbd5b1aa --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5bb0850ed44e50e4ccb2afc9aab9a80c17a31208454b069930105956f7f9a183 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..72ac93ee4249bf1220c3ed82f099c14ae0267a68 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:886b6be563b163a73eaac3a0ce905ce45ea5202bed173e897fec04ed18434edc +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..48f7f9c5072eabfc63c6e0dd70d31c5a6fe01ba9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/trainer_state.json @@ -0,0 +1,1038 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 8.933333333333334, + "eval_steps": 10, + "global_step": 670, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.1980743408203125, + "learning_rate": 3.614814814814815e-05, + "loss": 0.9424, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.7190055847167969, + "eval_runtime": 43.8492, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 1.9299914836883545, + "learning_rate": 3.4962962962962965e-05, + "loss": 0.963, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 1.7638393640518188, + "eval_runtime": 43.8416, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.467031478881836, + "learning_rate": 3.377777777777778e-05, + "loss": 0.8679, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 1.8204401731491089, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.7744836807250977, + "learning_rate": 3.259259259259259e-05, + "loss": 0.9249, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 1.7969322204589844, + "eval_runtime": 43.8499, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 2.61295223236084, + "learning_rate": 3.140740740740741e-05, + "loss": 0.8338, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 1.800340175628662, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.6296002864837646, + "learning_rate": 3.0222222222222225e-05, + "loss": 0.839, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 1.7916167974472046, + "eval_runtime": 43.8469, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.1177334785461426, + "learning_rate": 2.9037037037037042e-05, + "loss": 0.8077, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 1.800661325454712, + "eval_runtime": 43.8551, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 2.4448559284210205, + "learning_rate": 2.7851851851851856e-05, + "loss": 0.8461, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 1.8001384735107422, + "eval_runtime": 43.8564, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 2.592481851577759, + "learning_rate": 2.6666666666666667e-05, + "loss": 0.883, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 1.8040210008621216, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 2.9840338230133057, + "learning_rate": 2.5481481481481484e-05, + "loss": 0.7618, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 1.8591012954711914, + "eval_runtime": 43.8587, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.0177509784698486, + "learning_rate": 2.4296296296296298e-05, + "loss": 0.8613, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 1.8731619119644165, + "eval_runtime": 43.8558, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.9949898719787598, + "learning_rate": 2.3111111111111112e-05, + "loss": 0.716, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 1.8551521301269531, + "eval_runtime": 43.853, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 4.913350582122803, + "learning_rate": 2.192592592592593e-05, + "loss": 0.7622, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 1.8626867532730103, + "eval_runtime": 43.855, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 2.100435495376587, + "learning_rate": 2.074074074074074e-05, + "loss": 0.8401, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 1.8561443090438843, + "eval_runtime": 43.8617, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.0799033641815186, + "learning_rate": 1.9555555555555557e-05, + "loss": 0.7604, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 1.8695520162582397, + "eval_runtime": 43.8581, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 2.419006109237671, + "learning_rate": 1.837037037037037e-05, + "loss": 0.7594, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 1.858919382095337, + "eval_runtime": 43.8489, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 520 + }, + { + "epoch": 7.066666666666666, + "grad_norm": 2.5701613426208496, + "learning_rate": 1.7185185185185185e-05, + "loss": 0.8374, + "step": 530 + }, + { + "epoch": 7.066666666666666, + "eval_loss": 1.8769150972366333, + "eval_runtime": 43.8618, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 530 + }, + { + "epoch": 7.2, + "grad_norm": 5.503419876098633, + "learning_rate": 1.6000000000000003e-05, + "loss": 0.7094, + "step": 540 + }, + { + "epoch": 7.2, + "eval_loss": 1.9237473011016846, + "eval_runtime": 43.8515, + "eval_samples_per_second": 22.804, + "eval_steps_per_second": 2.851, + "step": 540 + }, + { + "epoch": 7.333333333333333, + "grad_norm": 2.483323574066162, + "learning_rate": 1.4814814814814815e-05, + "loss": 0.692, + "step": 550 + }, + { + "epoch": 7.333333333333333, + "eval_loss": 1.9152519702911377, + "eval_runtime": 43.8396, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 550 + }, + { + "epoch": 7.466666666666667, + "grad_norm": 3.480727195739746, + "learning_rate": 1.362962962962963e-05, + "loss": 0.6614, + "step": 560 + }, + { + "epoch": 7.466666666666667, + "eval_loss": 1.9116826057434082, + "eval_runtime": 43.8448, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 560 + }, + { + "epoch": 7.6, + "grad_norm": 3.9647531509399414, + "learning_rate": 1.2444444444444446e-05, + "loss": 0.8231, + "step": 570 + }, + { + "epoch": 7.6, + "eval_loss": 1.9044451713562012, + "eval_runtime": 43.8427, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 570 + }, + { + "epoch": 7.733333333333333, + "grad_norm": 3.6963553428649902, + "learning_rate": 1.125925925925926e-05, + "loss": 0.7385, + "step": 580 + }, + { + "epoch": 7.733333333333333, + "eval_loss": 1.9102691411972046, + "eval_runtime": 43.8393, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 580 + }, + { + "epoch": 7.866666666666667, + "grad_norm": 2.862262725830078, + "learning_rate": 1.0074074074074074e-05, + "loss": 0.7405, + "step": 590 + }, + { + "epoch": 7.866666666666667, + "eval_loss": 1.9190014600753784, + "eval_runtime": 43.8405, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 590 + }, + { + "epoch": 8.0, + "grad_norm": 3.0339815616607666, + "learning_rate": 8.888888888888888e-06, + "loss": 0.7132, + "step": 600 + }, + { + "epoch": 8.0, + "eval_loss": 1.9125250577926636, + "eval_runtime": 43.8318, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 600 + }, + { + "epoch": 8.133333333333333, + "grad_norm": 2.858896017074585, + "learning_rate": 7.703703703703704e-06, + "loss": 0.6342, + "step": 610 + }, + { + "epoch": 8.133333333333333, + "eval_loss": 1.9326601028442383, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 610 + }, + { + "epoch": 8.266666666666667, + "grad_norm": 3.1907079219818115, + "learning_rate": 6.51851851851852e-06, + "loss": 0.725, + "step": 620 + }, + { + "epoch": 8.266666666666667, + "eval_loss": 1.9524545669555664, + "eval_runtime": 43.8273, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 620 + }, + { + "epoch": 8.4, + "grad_norm": 2.529897928237915, + "learning_rate": 5.333333333333334e-06, + "loss": 0.7053, + "step": 630 + }, + { + "epoch": 8.4, + "eval_loss": 1.9474350214004517, + "eval_runtime": 43.8504, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 630 + }, + { + "epoch": 8.533333333333333, + "grad_norm": 3.163334608078003, + "learning_rate": 4.1481481481481485e-06, + "loss": 0.6571, + "step": 640 + }, + { + "epoch": 8.533333333333333, + "eval_loss": 1.9447112083435059, + "eval_runtime": 43.8445, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 640 + }, + { + "epoch": 8.666666666666666, + "grad_norm": 2.8849895000457764, + "learning_rate": 2.962962962962963e-06, + "loss": 0.6508, + "step": 650 + }, + { + "epoch": 8.666666666666666, + "eval_loss": 1.9450982809066772, + "eval_runtime": 43.846, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 650 + }, + { + "epoch": 8.8, + "grad_norm": 4.1778717041015625, + "learning_rate": 1.777777777777778e-06, + "loss": 0.7209, + "step": 660 + }, + { + "epoch": 8.8, + "eval_loss": 1.9446152448654175, + "eval_runtime": 43.8418, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 660 + }, + { + "epoch": 8.933333333333334, + "grad_norm": 8.232586860656738, + "learning_rate": 5.925925925925927e-07, + "loss": 0.7359, + "step": 670 + }, + { + "epoch": 8.933333333333334, + "eval_loss": 1.9448904991149902, + "eval_runtime": 43.8444, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 670 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 1.097866823073792e+17, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..0c475464b386da0df76c85c83f3903b095160b90 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0437d47b4025f556166106fb1432f893bf21d51f650e1e92ca167de41560b01b +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..a8a1aa6b2cbb351f5f846aa35f98a61d86fea782 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:36516dedfa3e3ca9e4620508b681621ae4657a1e4a31c2d1d2427fbc6cc5dec8 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..bb61823d0d78956427b74dd1a3fc741ba1b2381f --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c44717b587bf877ea1a37c7f5747a93e45e34ce231c845a31a9b8a042ee22593 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..6f069e8ad5743a7071d53989d6edf25a382b7133 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d33603c9602f50d32bd619f686fa4097b405a474d15f526ce09de1176943edee +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..fa9e835b088c30379449147e420f6701be4cbc86 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/trainer_state.json @@ -0,0 +1,1038 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 9.0, + "eval_steps": 10, + "global_step": 675, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5754441618919373, + "learning_rate": 6.814814814814815e-05, + "loss": 1.3188, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.5012128353118896, + "eval_runtime": 43.8365, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5712454319000244, + "learning_rate": 6.696296296296296e-05, + "loss": 1.339, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.5031358003616333, + "eval_runtime": 43.8379, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.656009316444397, + "learning_rate": 6.577777777777777e-05, + "loss": 1.4003, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.504779577255249, + "eval_runtime": 43.8373, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6610302925109863, + "learning_rate": 6.45925925925926e-05, + "loss": 1.4493, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.5054454803466797, + "eval_runtime": 43.8378, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7779592871665955, + "learning_rate": 6.340740740740741e-05, + "loss": 1.3202, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.5059704780578613, + "eval_runtime": 43.8382, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7496727108955383, + "learning_rate": 6.222222222222223e-05, + "loss": 1.4006, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.504396677017212, + "eval_runtime": 43.837, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.863124430179596, + "learning_rate": 6.103703703703704e-05, + "loss": 1.2027, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.5339263677597046, + "eval_runtime": 43.8601, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.184730052947998, + "learning_rate": 5.9851851851851855e-05, + "loss": 1.2097, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.5504236221313477, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0334141254425049, + "learning_rate": 5.8666666666666665e-05, + "loss": 1.2706, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.5586867332458496, + "eval_runtime": 43.8358, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2939094305038452, + "learning_rate": 5.748148148148149e-05, + "loss": 1.2847, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.5594414472579956, + "eval_runtime": 43.844, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.16665518283844, + "learning_rate": 5.62962962962963e-05, + "loss": 1.2021, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.5554323196411133, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.79972505569458, + "learning_rate": 5.511111111111112e-05, + "loss": 1.1703, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.563900351524353, + "eval_runtime": 43.8364, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.3145605325698853, + "learning_rate": 5.392592592592593e-05, + "loss": 1.2359, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.5570547580718994, + "eval_runtime": 43.8325, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.2332741022109985, + "learning_rate": 5.274074074074074e-05, + "loss": 1.2145, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.5824921131134033, + "eval_runtime": 43.85, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.6553075313568115, + "learning_rate": 5.155555555555556e-05, + "loss": 0.9549, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.6472513675689697, + "eval_runtime": 43.8532, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.6104933023452759, + "learning_rate": 5.037037037037037e-05, + "loss": 1.0582, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.6317455768585205, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 2.012712001800537, + "learning_rate": 4.918518518518519e-05, + "loss": 0.977, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.6610422134399414, + "eval_runtime": 43.8406, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.6339073181152344, + "learning_rate": 4.8e-05, + "loss": 1.0928, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.633288860321045, + "eval_runtime": 43.8388, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.5756804943084717, + "learning_rate": 4.681481481481481e-05, + "loss": 1.0711, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.6435223817825317, + "eval_runtime": 43.8356, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.2481768131256104, + "learning_rate": 4.5629629629629636e-05, + "loss": 1.1376, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.6346238851547241, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.5779513120651245, + "learning_rate": 4.444444444444445e-05, + "loss": 1.1164, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.6386021375656128, + "eval_runtime": 43.8402, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 1.8511667251586914, + "learning_rate": 4.3259259259259264e-05, + "loss": 0.9455, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 1.7448314428329468, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 1.7279369831085205, + "learning_rate": 4.2074074074074075e-05, + "loss": 0.9396, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 1.7224845886230469, + "eval_runtime": 43.8602, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.856007695198059, + "learning_rate": 4.088888888888889e-05, + "loss": 0.9466, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 1.7211538553237915, + "eval_runtime": 43.8343, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 4.504591464996338, + "learning_rate": 3.970370370370371e-05, + "loss": 0.8971, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 1.7197978496551514, + "eval_runtime": 43.836, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.2341110706329346, + "learning_rate": 3.851851851851852e-05, + "loss": 0.9636, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.7196643352508545, + "eval_runtime": 43.8421, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.2271151542663574, + "learning_rate": 3.733333333333334e-05, + "loss": 1.035, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.725122332572937, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.1980743408203125, + "learning_rate": 3.614814814814815e-05, + "loss": 0.9424, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.7190055847167969, + "eval_runtime": 43.8492, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 1.9299914836883545, + "learning_rate": 3.4962962962962965e-05, + "loss": 0.963, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 1.7638393640518188, + "eval_runtime": 43.8416, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.467031478881836, + "learning_rate": 3.377777777777778e-05, + "loss": 0.8679, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 1.8204401731491089, + "eval_runtime": 43.8339, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.7744836807250977, + "learning_rate": 3.259259259259259e-05, + "loss": 0.9249, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 1.7969322204589844, + "eval_runtime": 43.8499, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 2.61295223236084, + "learning_rate": 3.140740740740741e-05, + "loss": 0.8338, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 1.800340175628662, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.6296002864837646, + "learning_rate": 3.0222222222222225e-05, + "loss": 0.839, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 1.7916167974472046, + "eval_runtime": 43.8469, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.1177334785461426, + "learning_rate": 2.9037037037037042e-05, + "loss": 0.8077, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 1.800661325454712, + "eval_runtime": 43.8551, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 2.4448559284210205, + "learning_rate": 2.7851851851851856e-05, + "loss": 0.8461, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 1.8001384735107422, + "eval_runtime": 43.8564, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 2.592481851577759, + "learning_rate": 2.6666666666666667e-05, + "loss": 0.883, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 1.8040210008621216, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 2.9840338230133057, + "learning_rate": 2.5481481481481484e-05, + "loss": 0.7618, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 1.8591012954711914, + "eval_runtime": 43.8587, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.0177509784698486, + "learning_rate": 2.4296296296296298e-05, + "loss": 0.8613, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 1.8731619119644165, + "eval_runtime": 43.8558, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.9949898719787598, + "learning_rate": 2.3111111111111112e-05, + "loss": 0.716, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 1.8551521301269531, + "eval_runtime": 43.853, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 4.913350582122803, + "learning_rate": 2.192592592592593e-05, + "loss": 0.7622, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 1.8626867532730103, + "eval_runtime": 43.855, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 2.100435495376587, + "learning_rate": 2.074074074074074e-05, + "loss": 0.8401, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 1.8561443090438843, + "eval_runtime": 43.8617, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.0799033641815186, + "learning_rate": 1.9555555555555557e-05, + "loss": 0.7604, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 1.8695520162582397, + "eval_runtime": 43.8581, + "eval_samples_per_second": 22.801, + "eval_steps_per_second": 2.85, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 2.419006109237671, + "learning_rate": 1.837037037037037e-05, + "loss": 0.7594, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 1.858919382095337, + "eval_runtime": 43.8489, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 520 + }, + { + "epoch": 7.066666666666666, + "grad_norm": 2.5701613426208496, + "learning_rate": 1.7185185185185185e-05, + "loss": 0.8374, + "step": 530 + }, + { + "epoch": 7.066666666666666, + "eval_loss": 1.8769150972366333, + "eval_runtime": 43.8618, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 530 + }, + { + "epoch": 7.2, + "grad_norm": 5.503419876098633, + "learning_rate": 1.6000000000000003e-05, + "loss": 0.7094, + "step": 540 + }, + { + "epoch": 7.2, + "eval_loss": 1.9237473011016846, + "eval_runtime": 43.8515, + "eval_samples_per_second": 22.804, + "eval_steps_per_second": 2.851, + "step": 540 + }, + { + "epoch": 7.333333333333333, + "grad_norm": 2.483323574066162, + "learning_rate": 1.4814814814814815e-05, + "loss": 0.692, + "step": 550 + }, + { + "epoch": 7.333333333333333, + "eval_loss": 1.9152519702911377, + "eval_runtime": 43.8396, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 550 + }, + { + "epoch": 7.466666666666667, + "grad_norm": 3.480727195739746, + "learning_rate": 1.362962962962963e-05, + "loss": 0.6614, + "step": 560 + }, + { + "epoch": 7.466666666666667, + "eval_loss": 1.9116826057434082, + "eval_runtime": 43.8448, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 560 + }, + { + "epoch": 7.6, + "grad_norm": 3.9647531509399414, + "learning_rate": 1.2444444444444446e-05, + "loss": 0.8231, + "step": 570 + }, + { + "epoch": 7.6, + "eval_loss": 1.9044451713562012, + "eval_runtime": 43.8427, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 570 + }, + { + "epoch": 7.733333333333333, + "grad_norm": 3.6963553428649902, + "learning_rate": 1.125925925925926e-05, + "loss": 0.7385, + "step": 580 + }, + { + "epoch": 7.733333333333333, + "eval_loss": 1.9102691411972046, + "eval_runtime": 43.8393, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 580 + }, + { + "epoch": 7.866666666666667, + "grad_norm": 2.862262725830078, + "learning_rate": 1.0074074074074074e-05, + "loss": 0.7405, + "step": 590 + }, + { + "epoch": 7.866666666666667, + "eval_loss": 1.9190014600753784, + "eval_runtime": 43.8405, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 590 + }, + { + "epoch": 8.0, + "grad_norm": 3.0339815616607666, + "learning_rate": 8.888888888888888e-06, + "loss": 0.7132, + "step": 600 + }, + { + "epoch": 8.0, + "eval_loss": 1.9125250577926636, + "eval_runtime": 43.8318, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 600 + }, + { + "epoch": 8.133333333333333, + "grad_norm": 2.858896017074585, + "learning_rate": 7.703703703703704e-06, + "loss": 0.6342, + "step": 610 + }, + { + "epoch": 8.133333333333333, + "eval_loss": 1.9326601028442383, + "eval_runtime": 43.8359, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.852, + "step": 610 + }, + { + "epoch": 8.266666666666667, + "grad_norm": 3.1907079219818115, + "learning_rate": 6.51851851851852e-06, + "loss": 0.725, + "step": 620 + }, + { + "epoch": 8.266666666666667, + "eval_loss": 1.9524545669555664, + "eval_runtime": 43.8273, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 620 + }, + { + "epoch": 8.4, + "grad_norm": 2.529897928237915, + "learning_rate": 5.333333333333334e-06, + "loss": 0.7053, + "step": 630 + }, + { + "epoch": 8.4, + "eval_loss": 1.9474350214004517, + "eval_runtime": 43.8504, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 630 + }, + { + "epoch": 8.533333333333333, + "grad_norm": 3.163334608078003, + "learning_rate": 4.1481481481481485e-06, + "loss": 0.6571, + "step": 640 + }, + { + "epoch": 8.533333333333333, + "eval_loss": 1.9447112083435059, + "eval_runtime": 43.8445, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 640 + }, + { + "epoch": 8.666666666666666, + "grad_norm": 2.8849895000457764, + "learning_rate": 2.962962962962963e-06, + "loss": 0.6508, + "step": 650 + }, + { + "epoch": 8.666666666666666, + "eval_loss": 1.9450982809066772, + "eval_runtime": 43.846, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 650 + }, + { + "epoch": 8.8, + "grad_norm": 4.1778717041015625, + "learning_rate": 1.777777777777778e-06, + "loss": 0.7209, + "step": 660 + }, + { + "epoch": 8.8, + "eval_loss": 1.9446152448654175, + "eval_runtime": 43.8418, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 660 + }, + { + "epoch": 8.933333333333334, + "grad_norm": 8.232586860656738, + "learning_rate": 5.925925925925927e-07, + "loss": 0.7359, + "step": 670 + }, + { + "epoch": 8.933333333333334, + "eval_loss": 1.9448904991149902, + "eval_runtime": 43.8444, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 670 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": true + }, + "attributes": {} + } + }, + "total_flos": 1.10605985906688e+17, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..d693a4a9c3245e0eab5dbf8beeaa8489882999ef --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0af632164573816dd6fa929dfcba66cb9b3e7049cbccb833788dcd664dc4d725 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..2ddb3b4d10bc66b123850e7a15d13a4889b7779d --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:47f3502a659061f4499b17804d130b08f296ee9f064777db01941fd8bfd953e6 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..2b1c959e3b92a9d3847cd61e595c79a1813cfe3a --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0bf8faccd3d2ca94b80304c3092e394e13d076f35c0c4f51d74490ac3412d5f9 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..edc613be9a8a7736c1c5e6c411193a18eb94121c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:116e4caee7c9274e6f2a7d93ee5e67e259426d00592030a182ec1bf7e3e1fd99 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..3764bb274ec196c4bad6f67e1e4fb6c4b78ca859 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/trainer_state.json @@ -0,0 +1,138 @@ +{ + "best_metric": 1.493536353111267, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70", + "epoch": 0.9333333333333333, + "eval_steps": 10, + "global_step": 70, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 1.14702503903232e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..203c2aee657a0c57749f99d7288471cc9cf91618 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c70e8a7a316b3926d9f462c970ec41c074914d4a87e36977084fd2f74bf7f7f6 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..735fa28fda2a77e371be09aa88a9b9ddd229c779 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e46bb7373373acdd6c4212cd840dada429811bfb18f9c9e7220a9d4b7a8cb810 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..0b228b8e8106f666fe286c5d131d496d926a7df4 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:debbe8bbbf3d0dfd719072ab48974c332b6f78ebe25ef99f5002c8d0a8c8c380 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..8bf25b5c8780313aa53c49c9a020653afda88fbe --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8e54696b8c39c3b120a2b1d4d03623aee6400315f6e759074fafe42342c8bf95 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..9e7b348d4a963d02cff0e97eaf961e401f1c35d1 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/trainer_state.json @@ -0,0 +1,153 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 1.0666666666666667, + "eval_steps": 10, + "global_step": 80, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 1.31088575889408e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..b765869b9f6a72d3efa66557e75accad4166200b --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6a4f8972efcd37cf568058f493946ea3327565a148050e73749e5acc065e657d +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..8dcd9e248fc7d866236565db19dcdab38ee6f945 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:aef9bbba58d20df3821454fa746d41c11e4dbb9103c1277e6527d110e8741996 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..4041231f7cc289aaec627b941b3ce1ed104a3678 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5e1884689751e2c9aa53b83d7472089621e5727e27a037b479e2287c7b208b1a +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..d1e19095e23644fde7d19dd9320fdb8daf7fd2bd --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:28209e35c6873af016e1c69801c50fdb913d066bb8fab0d3da00cafc566c1a5c +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..686ad7f8a1f9f271e88ea83fa71bb7d8db81cfeb --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/trainer_state.json @@ -0,0 +1,168 @@ +{ + "best_metric": 1.49330472946167, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 1.2, + "eval_steps": 10, + "global_step": 90, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.4169997572898865, + "learning_rate": 7.881481481481482e-05, + "loss": 1.4, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.5159306526184082, + "eval_runtime": 43.849, + "eval_samples_per_second": 22.806, + "eval_steps_per_second": 2.851, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4059029817581177, + "learning_rate": 7.762962962962963e-05, + "loss": 1.4877, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.5063730478286743, + "eval_runtime": 43.8457, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.33643633127212524, + "learning_rate": 7.644444444444445e-05, + "loss": 1.4719, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.4993542432785034, + "eval_runtime": 43.8471, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.319843590259552, + "learning_rate": 7.525925925925926e-05, + "loss": 1.387, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.4972199201583862, + "eval_runtime": 43.8591, + "eval_samples_per_second": 22.8, + "eval_steps_per_second": 2.85, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.33250561356544495, + "learning_rate": 7.407407407407409e-05, + "loss": 1.4814, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.496050238609314, + "eval_runtime": 43.8621, + "eval_samples_per_second": 22.799, + "eval_steps_per_second": 2.85, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.31121477484703064, + "learning_rate": 7.28888888888889e-05, + "loss": 1.3654, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.4949277639389038, + "eval_runtime": 43.8247, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.31646546721458435, + "learning_rate": 7.170370370370371e-05, + "loss": 1.3506, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.493536353111267, + "eval_runtime": 43.8447, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3309181034564972, + "learning_rate": 7.051851851851853e-05, + "loss": 1.4211, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.49330472946167, + "eval_runtime": 43.8337, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.36814063787460327, + "learning_rate": 6.933333333333334e-05, + "loss": 1.1364, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.4979619979858398, + "eval_runtime": 43.8386, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 90 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 1.47474647875584e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..84a5f76784fc713d880ce89696fb38dfc0b8cf7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1f3c600fce0ca92a582068910275e58655dbbba3eefa41339aa611b9d11e0fa7 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..8a912eb34234a95df02d6a66e82692a4b726e73b --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:90d499df6b40ab5442dbfd39b2a4c936f9a6f4a61799cd148fdf44fb850754f1 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..cf4ee5d3e6af9b88eb054656a3869866e984eaa6 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b7c36fb7be07d6e9964d1e266751698f301408788be22820ec04f4ef371e221f +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..d0cb160fc6752dc0470bb88b1ba16dca7ed969ca --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fd418aa175a4f9508778329e5c11f54241882ad7316c344103bc3804e613599f +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..c7c7ba5a5d73c30d2e2dfccf92552709b61b1a0f --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d4f7e5b3f15e6248eb69742a14f905c700ecf357f80b4e2f91b8b83b2a38d15e +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..7349e777b86dd00e29db9c93a062a6e89f2861df --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/trainer_state.json @@ -0,0 +1,48 @@ +{ + "best_metric": 1.8035988807678223, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10", + "epoch": 0.13333333333333333, + "eval_steps": 10, + "global_step": 10, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 1638607198617600.0, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-10/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..6211fb2fcb01f4b440c3ed09c84c0fe9d3f75984 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:13d1e5d1bc973b1d45b01d4659e612d3d20ecdd4c71c2ce8e52dc63fbafaf77f +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..e24fc8cf862162fc1dea82eff4783811a547c7cd --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:480874b6dd30421568b5dbd5426a0ab17a684c053e9240bb8208c90c417e2de4 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..e6cdf36295b4d559507cf0b068680edea3de3a81 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:46513e9b1de488f3d70a4461303e6b827989f588807354e14d010b7ee4f4679f +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..4925a186b58b44a431f46ed79a3754d773929ab5 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:60072abebe6d2753254086e9e625588f78317f185bc38fa6406b734058451581 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..f7e0e105adf939707a7f445db46e66249411b9a1 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/trainer_state.json @@ -0,0 +1,183 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 1.3333333333333333, + "eval_steps": 10, + "global_step": 100, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 1.6386071986176e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-100/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..e23c869908a8b9aeb1205be9a26d051ae654a444 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b46275f918480cbee48bbc456a9de7803bd5eb19ee2c15640ec260cf9897fd5c +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..f79bdfc2ddc185146e0cff1271fc931bdb1f03fb --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7e637335bd9892157c6b664fc2d991c4f8fe81dc6e16231891170e985f65e423 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..e8b03e39b0cf81b4b723b9421b9fca8f87c7b414 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:319884e2d6c1fad0795ced8add37e8073910c77073120da512a5e6a1f6208d62 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..06fd1d13c97f1f1f22fcaf41207579844f1ca562 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8c7b0ed85318129aa45243aba4672dd190ff0cab7af3fbc863e9470ae3ee5518 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..66944a118daff2b3ae7dafcb6a1ec9d377ebdba4 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/trainer_state.json @@ -0,0 +1,198 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 1.4666666666666668, + "eval_steps": 10, + "global_step": 110, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 1.80246791847936e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-110/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..ea799181b7911608ff07796df2b9d9ee5ffe9b70 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:850a76349ae82ed0dcd99a47255e79093dd1c39ebcc9083bfc98a87641918c90 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..3493cacb8523f8be9ecbc6cf5c296964b8323f9b --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f9f94a3b211fe4da15ab44ece1e5b9aeec2455af2eb69e2dc7a337e0acf2a662 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..71b7a5227226dcaeadffec096acbc7df0f632989 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3500ac793bd5f15c49da717801f854f9815260499ab4bc16b8f3a1ca9c82dfdf +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..e63acccd70ed72b56fbae077aaae3c3e860df98e --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cb852b3cb535fecc52ae9194cdcdc2f881d7fe32b722d7aefd0c2949aab38351 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..3f6b7e27d9a32c123e2b4d49f071b20f785b2941 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/trainer_state.json @@ -0,0 +1,213 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 1.6, + "eval_steps": 10, + "global_step": 120, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 1.96632863834112e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-120/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..0a7be496570cc68623c19b6e44733ba5ab4ddd86 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e90574e54b2d30b589655b4d06a6cc1cf7c560d09ff6e0a3407239e034bf2a6f +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..c76a6a5f37797b8dcd02e89eebf642d43c96c0b8 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3d2a505786697056c5a4f50dfa11d5866220f1ca76687ae8d2b8f9baea0dc48c +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..b60cc4cb8217ae694c7a8efef0eb0b676d897e83 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:602f503f7cd2e84c0b6719714b66d34e98b340f44b02ba8ffc44df096e786100 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..3e6bf31460bcee7a7c8e2d3a44dae57ca9756e84 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fb3d4fd73594e642debe12e534d13131882ea6c660a00fc6f6b39408ccaa0be9 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..051b1e2ead25ac4d948e441e584e67148b0a62aa --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/trainer_state.json @@ -0,0 +1,228 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 1.7333333333333334, + "eval_steps": 10, + "global_step": 130, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 2.13018935820288e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-130/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..a76439a771236f0ff244c6a20d288a5a600de162 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c1513d65228260e5c4820f7348c1b103e880616a04155055a61b45148af3694f +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..7ce282e8b6f22ef33eff2177ac5888bc12e53de3 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:491a51321bef679a01cd17ef378a83c294775607171e2bc9d7ea7fd8a3198020 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..d05f19f3c7e1e4b728f62f56852d18785b6ab4d0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:03c218af617af689aa7eff2d02ae91fb859e96fcb9571b641c5e95247f137dda +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..2037cfda59389ad04c653da89dc5ece29fd8a1eb --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e78eb0e499cbcedf2065f612c2640c14e35853bbb671a435f85400df13b65849 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..bfef4c8cfe58600d968f2c877b10435e9e675acf --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/trainer_state.json @@ -0,0 +1,243 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 1.8666666666666667, + "eval_steps": 10, + "global_step": 140, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 2.29405007806464e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-140/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..b803437aaf719374a112cff397565d37c84aed56 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:13bbde82f1d24979cec420c9722b545c6e0166f0a4c97f26fbf50fd90c182f29 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..a05e7ac677c85bf82d10876b3147b30e180ed19c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1266b411f53efb1d8616b86bd6433c1d970c78f9b18e1b848600ec85683f6d5d +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..61dde1ed8b180510bbda84f0c71356862600ad55 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bdf2188bfe5b1127367f0a0d0628c845d9f54239950b10ed26be9372dba68d0b +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..2f0e2eccf4636123e13847a06c34756ea86e2ce1 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:01d6ca2c3de71fc307a0c56f31552cc216bf90d6eba21471dc5958340a2c2285 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..6b36a070386ddf5ac0aeb3504e65083a78b9d54f --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/trainer_state.json @@ -0,0 +1,258 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 2.0, + "eval_steps": 10, + "global_step": 150, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 2.4579107979264e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-150/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..c7f5147ba5d93e0d17e97fc1fb44d0b614c4cf8a --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:17c2fb1e967623307100e2c906b866c5dc440846e3117fcd5ef2f93c0281c7f8 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..44c752f79ebb8d066245645695a21041ef7c6e62 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e3e1aa386a788aa321cb31b0ad8bf3f7c3b8a2d5bd51985ef01d95885fa62ec6 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..564fc6da8e7c6b2c0f5b62f1f2e55b96ec29c066 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0f1a4ff62819275ae908067e10e49db3630270d7e753db72e5d286184508926f +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..7003714472211c91011c0e61ccb96ea383add5d6 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:086780aa9b208bce3e425496825484b08be311d15b0b14460f1ffe79ca5724e4 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..f718e3dfad404ead978e3bda883debe66c1892cf --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/trainer_state.json @@ -0,0 +1,273 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 2.1333333333333333, + "eval_steps": 10, + "global_step": 160, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 2.62177151778816e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-160/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..13c355d9d7cd07751840e3f4b4b0e096fb08f050 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ec9ee2633c260b6620b38aaa07bf0efa19c3739b9025c2f72752be455e82dfe3 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..f9b414e59928496cd2bd60559f138e39bcebef3a --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:54a7e3446671114256545eba8992c7df9e791b6000fc1a84e5864fa3d3fd14c5 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..c13cd397e2cbe97d2fb9e944d382c58418c6b136 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:964f6178720317ac51eb375c889b2d86c7184aa024caf52b59339853ffae03ca +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..08dab9472710b1568137d300a028553e91b8e33c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3f644a18bd224b289837e8b255d1bc77f3a61e9f5228b89f6d995c806adfad9a +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..392dabce4d3fa8d21cc4cbde870f48c77c70abfe --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/trainer_state.json @@ -0,0 +1,288 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 2.2666666666666666, + "eval_steps": 10, + "global_step": 170, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 2.78563223764992e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-170/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..693a3d723e8f6b7a9bf4f42b123f0ea76fe91470 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e76428e63ef22f402fbbbaf579cb6d1a73b26284d9487c502486f60c7d0196dc +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..39362320602419fedb33ee5465aa0241cd8067b3 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9ce180abbc01a07bebc7bd5f383b4896dd9a483de5f0cdab070548919297f7ef +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..fdca3aeb31ce5b4aeb2c0f2ba53e3e43b6334331 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1b79baa0842c2916b082cba36f9f2b958210e6d7c1813742841fb908cae57fbd +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..29c6a3e72ef015195c79fa565499604a99c3238b --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f03b6789f6fc5c23fa97b574dd3da7e3a4ab78408a43f9c087ae8f29ffcbbea4 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..559628641d5bfd4a0abeb32c37a99a7f943fb16e --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/trainer_state.json @@ -0,0 +1,303 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 2.4, + "eval_steps": 10, + "global_step": 180, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 2.94949295751168e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-180/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..8cba09847b06af38aa91a476b53d84d9d61fcdb7 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:eaed631a60fb6aeda52858c30ff1c51b996830addc5e5e1d33ebbacc71e75741 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..80a7afe7507fdbd04d22ea8fbba69dc712ad8163 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1499effd55b559d04c9758a960a46a0d97e35cb0f05697f6093fa134bc102185 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..ae44ad6727cf9b3af903ea84902fa6c7f13a5a95 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7d6f4346bdc8a12fcc48535a6002ac46345e4ce1e14bb1f7e9dc3b0ea920641c +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..b138eaf1e7294b95b63cf0f5e77aaa5651aa7871 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bf989c6683600055d8c21edd8b036fc8ec3870e69609cfeed852c01d4fb783ff +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..3bf7ccd04ac46e3e93ba68dc86823e218c8c7064 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/trainer_state.json @@ -0,0 +1,318 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 2.533333333333333, + "eval_steps": 10, + "global_step": 190, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 3.11335367737344e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-190/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..890491ca05add5fb484ce2b5deb4c46c5f758be3 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8ecbe8a0b25e82be43b66255538d88799228af20cefe842eebfb79379c2c8177 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..17e2d7709e4193cee1b1eb66bdcbd65cf360aadd --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7561a55b878606dcdfdc4bec85a3cfbc8e036ee6ce47404fd6289bcf2ec49cc3 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..fe515b4492af517bd45c5a5c7abbba2b94c5ae37 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1a5087ba42b4dd9dc68875c89890b692068c71de7009ff67cb7d8492bce11049 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..1133483a71406a0f78deaf017d442d6e7f56cc47 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9ca34d98cc8cff1e4ad458af4dd8fe80da454f52fb4158b0577a2a821fa94a95 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..28ab3e80ee16d24278842167d8a0550883d491bb --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/trainer_state.json @@ -0,0 +1,63 @@ +{ + "best_metric": 1.781802773475647, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20", + "epoch": 0.26666666666666666, + "eval_steps": 10, + "global_step": 20, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 3277214397235200.0, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-20/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..6593e6eaaaceb51dd7ff719b48b4ef643e2743cd --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:711adf77fc7adb6e8a419d069e867854f595178c07a603fabbda6bbce821be5c +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..43087f0c828033849e412c2b387859b7380c2b01 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3545a63155ab62b6db3c9914f5dac2d2407cb5ac01502c7059e3322e0afc29b9 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..da263858f32b7536e68a33626ef41e3ef7a44689 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0dbbe288070e588c7effbe11249d330a3ad16131211e6b5dff1d03a8ebc7517f +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..b0f07f929fecc267165555e571b711e64132e6f7 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7f688f9bcdc66390bb2581a665773530bb1a08ae9bdc1413898ff1347742f43c +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..dc7a01af0a87b45fcee84e6a0835c4fc05c49ba4 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/trainer_state.json @@ -0,0 +1,333 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 2.6666666666666665, + "eval_steps": 10, + "global_step": 200, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 3.2772143972352e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-200/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..2a45991611c70c82b1414519b1838f0e561587b6 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0ad82181963dea7cc7bedfcaf66a85c3f6ba5540b5c9930755b6579ef77826a7 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..598cb8d9f5d5cffb5dcfb308ccc4b1203ef13c4e --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7433c94fadc65f014ca12e2d39feda64f3b3497d43047e42ad0367e9441fe5e6 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..605214081e6b3060d6c3e526fc86e8b8fff3c71b --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cd4e0019fadc179e2ea531ff33d86db759cb80e64a8826bb6bfa90c2483bfc04 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..67ce491755f9c5f753c42b9ea33be81dd9f8a131 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:085996a8194241f2fdb489a02e8b9252649984b08945b06670bf62ad13b831de +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..a7cefb63e35023c0c66a949a469e47c4767535d6 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/trainer_state.json @@ -0,0 +1,348 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 2.8, + "eval_steps": 10, + "global_step": 210, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 3.44107511709696e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-210/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..2bdd3d7d803a16cbad0ec6463d76b1e984d34e45 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d717813abcdf553c24157763f2bd16ada5a03de019bd5f0397c4be418119abe2 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..c2f60daab4be8338022cd8cbcf46c8c3cdf74484 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e07204e9d8397eb9021047865fbac3042f34586a95d5fd416a1f242288aa2779 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..823c878e3ad7d7799e1959fba97c90aaf79af4f9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f5e4256f7b7ace2dd6194570c191ab9026456dc0db24025edac4a5bd9e379dab +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..ff86f52b850328c2d8f4d1f372c5649cff03e543 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:afebe0583be455041040155b8cf242cfa36c2fd3aed81f0bb042574b3d11c816 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..9da3bb4b87245d7c745eab0f0a42ebefa797053b --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/trainer_state.json @@ -0,0 +1,363 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 2.9333333333333336, + "eval_steps": 10, + "global_step": 220, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 3.60493583695872e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-220/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..46cf9ce151a13b1742bd910a5952d3c3713b33ce --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ab464f73f974b9d5baf87f56b8f158cc2d9515991c02ce257f463302a2361e6e +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..47b231a7fbc12680e999a292a64290b922f753af --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9dca49dd4fe34127bb876f6663ad02e5298c21308ee9ddf6a786f9d800d373d6 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..ae85ad205796b2c3955218eb7b4b348ca35978c7 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3e2b38199e26ee1965ef79aea019c0217039e7dab109a4b6e29c57f1bea63d6d +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..b0031e0c496cf5abb25ed78d6e38e07a220ce043 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6d1356fcfd01ae5906aac09e7ec62f27b1afad01eacdcb84f123c2c4502886b8 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..8bae8bc99b81b404f2fad2a0fbf96ff5e4de10f7 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/trainer_state.json @@ -0,0 +1,378 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 3.066666666666667, + "eval_steps": 10, + "global_step": 230, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 3.76879655682048e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-230/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..e6a592742f48415c89e7577bcf8eea01a6e442d1 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2578e1255550a543c5d0eb0d178cba4be89982636689128c3216e97862195ff6 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..f39d8de7f435a6a55021a70d248ba920740220c4 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:07ad62b9f13dc6c5e4e4f5d84d6d6365dbd2fffecac9f140fc80ccf70908c993 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..846c31e0418b3b3196b4e9c5d730a866c947d1d6 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:33d7857a6e3603508425c326c1a1dee439799d2c72bbfc8afcabbb8578757780 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..3e7e86f4a758107d182040d0a6b86e40bcc72f53 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4730dfe02541191e7acbe6178fa9907735bc80cf60dc356460f9a0ca3075aca2 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..5e8ee51b4de3182d0e4daa2489cf9c8bd8641fb5 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/trainer_state.json @@ -0,0 +1,393 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 3.2, + "eval_steps": 10, + "global_step": 240, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 3.93265727668224e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-240/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..569320b0bc1d758638c3720b62b20ff398cfeb84 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0a448a4ba7794eda76fc601a920176192d7007273951c8d584f7cdd7bb2559d4 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..43247791620a3edef3bd249546bb960604e31f58 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:21345faeae09ed5fae6a6024556fa5b885072913a773820d6ee55288912e836f +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..90df82c0a610ae490c2592c79d46fe23cde8d351 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1a5b7a10b9f8de84d4eac8f0b5437669695e0a3ed004e055b39340577de17c55 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..dbd2dc1560f92e541e8ba7bc84bc9ce93c9d1a8e --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cac52f10d00ea64a7e134e5021b8e94e73bb2935342d54982a8a3e42dea214b1 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..456f28998b8e05baa51f6ea675eda353b9cfc9e7 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/trainer_state.json @@ -0,0 +1,408 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 3.3333333333333335, + "eval_steps": 10, + "global_step": 250, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 4.096517996544e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-250/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..81b1b28648bdac841ad01395fae8b9b615878cf0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:59402aa4f0379d587ed199379509380fcc6c692d3103781ed58c8f41d4e00884 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..4c980e7d1c278392dd29b825449732c0650a9c0d --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:18b500455bec97babd7caa7b04330470e431fee23b828132257549bef8572c5c +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..293d181974003fee2540af0648cfb4e42786ca56 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:78bbc69e88d5e1fb15138660b4de76d03b9476fa1ab2d16370f894a65eab3da3 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..567416c89fd9a753e9c546d0aad5a7def6cb0a87 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b657575d4c4b88ea1f026453a75127181cc1a57cd8b2b18fbc58fb11f11024f3 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..ecd118bb117fd1bfcb5474ad2c89c46de6d1f9fd --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/trainer_state.json @@ -0,0 +1,423 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 3.466666666666667, + "eval_steps": 10, + "global_step": 260, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 4.26037871640576e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-260/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..26b940f28545a01c03ad9cd1a251a9f6eac395c5 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:654e07ef022b47ff53350302203f06462db70760d32ee455dd677856a02a33cd +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..c716bea7359c87a7bf5bab6feb03bc06e5e2700a --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1d4b681898221b09faf8da53461544f7836a98964325941c710d80ce88565c7a +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..ba62c782c818c1b90b0344e262a00bb91255dc87 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2af2c0de08ddef877a4af0e5f2dfe4570d2f029659f125fbfe3bbcce3a8b09e6 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..aa7eeb3a87ea1ca1e7352f08759f4051aa8258a5 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:abb06701181b90f771f65b0cba814986a9b7ac92e4591d8a01aa1c6efba77ce1 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..97d6be5419c4e09d90b5451c0e59ced580832fd8 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/trainer_state.json @@ -0,0 +1,438 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 3.6, + "eval_steps": 10, + "global_step": 270, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 4.42423943626752e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-270/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..4c6373e0fb75ec6d6518592214e9f56318b697ff --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9909a0d05ee76ce4c13f28a037945b17fcafaa087a11d2d0d71b8951c647f76b +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..373eea3bfac4dcbf4b095d3ad14874f743178033 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:63d7d05bc093b8136b134922822999916a295d32fefd7c783e260d74b61afe01 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..1702f62666b39cac633a34cf312f24e311e13df2 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9ba79aaff190fd3ef9f70dd7c0a234665c2bd6c6bb243b5896c5bd6a16356627 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..4a657fe9e903dd4a89dd04673e22a17c86dbad42 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:819fe3dcc18383ccef3dc2fe508ddac3e00702050d5d602aaf094a47cfd838bf +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..fe4e70aae5203c0e7bdb942be0ca3550d2479dd2 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/trainer_state.json @@ -0,0 +1,453 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 3.7333333333333334, + "eval_steps": 10, + "global_step": 280, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 4.58810015612928e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-280/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..820cba80b5964b56b7fd6a9c0f6c0abf7f8c2c74 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:63e0763846ed36ffae68f60dc92bb5d2528a9ed39a3b7b07470088ddd577f7da +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..0bfa946376f3d170b8327c3a32c224cdb73b0a93 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0407c172e711625d21af68159d49514abfff9d1ea9cc04760b4b36b70567e812 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..fecfedbf1488a31afeaf7c01dc4f9760cfff1b16 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:47c6345b8afbd1f7a687e942ce33ce022660a29cb46a23e4c9eda9e498053741 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..933185741fddd4e80babf7cc8a77cff149bbf3ff --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1b158d5ddc1d2b97945724e562a4c35df5c18023a4c36f7bee8f109155034bf0 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..79bcbe3439eaa2d397a0f9a20dff687804fec828 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/trainer_state.json @@ -0,0 +1,468 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 3.8666666666666667, + "eval_steps": 10, + "global_step": 290, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 4.75196087599104e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-290/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..f0aa05948d037f64b97d97647f49c09a3b4f94e5 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3daef6c02acd829a2db21974d2b67bd763cdc950d1cc1f323854aa0522e34f4a +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..e6b2483d45269d4777aef5eba2429d8f081322d7 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7f452adae64219f6d6afa94ba4a9df5f159e279d457e0496df25a449791734c1 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..76ee62462f7b8b87edaf24539d12d81995c70164 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3a5478e4e53ebdf948038ed344f6e976416991ec94630cb094a18d5adf7aae7a +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..c1688f17bba99489f276d752148e5fdb1a51de4c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6e0a57dd22ae44f20eadc48df05efcdf32876f538dfa9ba64823410daf518f90 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..308ee959a55200efdcd7fc37c827a84ff36d7ce2 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/trainer_state.json @@ -0,0 +1,78 @@ +{ + "best_metric": 1.774838924407959, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30", + "epoch": 0.4, + "eval_steps": 10, + "global_step": 30, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 4915821595852800.0, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-30/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..90116c4d65bbb3eb9e0326d3023a29b1183d2de9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:84a6a28735c5249c4589369db3a9aa6adb960292a2a75fc6b0b199478bc2a99b +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..ace968513669f212cbda61a42921ae53325b9c8c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9ae22e73b61bccba5f1a6b72bc6c3a770f25bb0a8ac8340a17928d2ffbacb8b5 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..4d8ba268ef07796e970a23442889935701a1dda5 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2574c6149307e492ef05d2031918a546356cc654f4671c817f05ae6d0764de7f +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..4ff153ed07693f90bd9914df3d3a9100087ba152 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4b3861b9f1c59dbd4e1ff81c91a81a79174248e3676d03815d832ad9defdfa02 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..18a7f2ea66cbb086cf6ae2b98127a820ebf0a84b --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/trainer_state.json @@ -0,0 +1,483 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 4.0, + "eval_steps": 10, + "global_step": 300, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 4.9158215958528e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-300/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..0651d5f60899ce9add445120cba3836c8c9a6ad1 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:377b4e80dd2885fb3457244b027c234fba7afd3fe2d8e99c74e3acc55927528e +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..df2dc625a25504fb8ed150905ad0b03b5b371a64 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0e7e153e7910620bb717b152f015475b17b7a740f948509f41ccbc03fce2ee5b +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..a5b4503b006d8dec33c7a086d3d007eef4282144 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a82d768c5f5c231c8b50481a409281b8639e231a185281a7476164488eb6c27f +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..1e086272f5dd0984e6962757f823d9f2aacaf771 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b111834dc6d7b66a1fbf92ae20f97bc4a522817eff7fb700836a612cb5f0fc0a +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..b3a871f4859cd628030d36fa94e08470935005d3 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/trainer_state.json @@ -0,0 +1,498 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 4.133333333333334, + "eval_steps": 10, + "global_step": 310, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 5.07968231571456e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-310/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..70b02a25d7b631abc17a510f8ba7e3fe6550c9e9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5590666574ebecd439c9581453395ac8450841ed11e127eddae72efe7f8c8495 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..15980b705f9a0cd15e0fb8dc041a19f3d2e9bd42 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d63e8f4692ae4a431c1214acd46f4763edc4eb1ac7e1d2bd4dc7154ed663aece +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..f5fbaf3739704eea759ab29b4b9eba0fecf79ee6 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4f581763059f9808c6971d543bee5e034fff1a9ec174cb7aa232dd9f17099da0 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..8ff1bd4578dc870e2a3afedff5bbe205511a2728 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3a23700154863dde7a0e4e14f4b3f89d761bab8e168d94035889b30e583b3ff3 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..487283536b8e9e9a815852165c893d476559b7e9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/trainer_state.json @@ -0,0 +1,513 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 4.266666666666667, + "eval_steps": 10, + "global_step": 320, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 5.24354303557632e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-320/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..19f942e602af9d5d6fe94e7379d71c896a5d89a4 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7068b773b956b7b4e9334743a67e832b25fe89ca93bbd4bd085620ad81f630c4 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..4945a87a52f1067212466316d4ab4443961df900 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e582ade2f07dfb6ea08be4a283ab6e60f9a3c197aa9ee2981a30da2a36eced9b +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..759bff60bd0897427bf9d4410df520d35fd20081 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:389caf1bb32aae3a751e11d63ffe273f089df59490c4ac6e5883d944b329df0b +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..7083ad6a8ff6d908a0f70041403b09c06d781a9a --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:392645a8f65cd22133a11ca171399b5f2c0f4566a55089c7b4e28e0722b58e63 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..381127d9540eb1b366e7b0c7a6ab202cd6cf5baa --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/trainer_state.json @@ -0,0 +1,528 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 4.4, + "eval_steps": 10, + "global_step": 330, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 5.40740375543808e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-330/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..d5c1dc5ad04df4909ce6e68e037b303665b1fcfa --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3cd2db0196c680c44744c84f67aa41896ecfb5ec22526afb2a16c6dd72e31f10 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..6ded6a5116f7c4b6cb4ea00e02aade30b1b55291 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d66a80c941b4fd7b073dd097a09c92d443c13ad9c92ba1343d61e7e88e7b8fe5 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..4d7fc830aabf2c4827b0609ed6e355d0fa80523b --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b904f845552beb994fcd34362e728f918c7473ac27288d463195b51c3ed73bff +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..5cf759521d751f7119b438bcfea1dd3061ad770f --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bc82482262d5bcc301d70be7cc20de5505fc4951ede6a4bd1bc8eddaae70ec6c +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..5e527671b6746282f404ef93996a57b015ac36e5 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/trainer_state.json @@ -0,0 +1,543 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 4.533333333333333, + "eval_steps": 10, + "global_step": 340, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 5.57126447529984e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-340/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..d8a6980a826ca2ca31dc1f05b2bc225b017c3002 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:667529357b4b159ee4122f14373f498ea7b537d63800220e1d9358522d835e95 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..25cba3b6dab268e31d73acbf6c9e98f65f972a36 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0e016e1dcb21a3b9765c678aca98729c09ddefe358e9b0b85b19d044749e3436 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..fc3bb37d365dcd8ae3528d8e7242f7d2eae755b3 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:39cd0c0a4049d541d90e7c6154cb21167a341830884ad3558195617942678446 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..122200785c30ac91ca1700a0c02eb0676b334191 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:19a930731f7e25af6e8ccaa3873041888b8bd1ccd468f5d25f665abe8de2fd71 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..803eee4c0ce0505b8faba7f6487f4ed9374bc2c6 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/trainer_state.json @@ -0,0 +1,558 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 4.666666666666667, + "eval_steps": 10, + "global_step": 350, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 5.7351251951616e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-350/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..fac49f073d3d1be91ae21836a47f6648786c576f --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6d438074071c9088c3f9ad9018e1b1573c3a0c961720e4059cce0c4b65a198ba +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..c9f39ebd0662d46f19d4975530e90dfa383799e8 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9264cd1f9966cc382e5f3195365821667b4a9df6c0e36209384ecd5b5561a2cd +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..dff7e422d3f8fc71ea77fa33b28878ffbe8abd43 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9d73d43b628bfbe3f56e29099c04e9e9584349f935d8148aa8c34849bf03ef49 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..47758cc7cc6a0980057d11798100d645481f2999 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d7e1fbdd565ffeb34e2c7d3acf55169d38b4c137b14b50cad9cfadd1b6da2270 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..3e662af7c4690425891192020df56486409db377 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/trainer_state.json @@ -0,0 +1,573 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 4.8, + "eval_steps": 10, + "global_step": 360, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 5.89898591502336e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-360/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..451e24dc533286453058fadb1fe5d5e1b57ce3a9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:73c5105a6669beeef01ddfe8bc69b592466558819658fa35c60ce31f6ecbdb11 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..1ba51bbfada1b1cd43186a316e8d098e4c7d4361 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9f0a593119877106e71a98de2da04c09817424cf05f1260b07e91008f933f9f4 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..792417d4c800bc4c8f7eb21d5421678309a6165b --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d7c0e313f3d6f9e1adc7603b9ffa6f0ab3438f71ce0c71bd9a788485d02b981c +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..88c3cfde3314abc3cb7ac1980343fd522476763a --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2e86e66477f9e25870a83a9e033c8901d333cd688bd5691a4877c5a0230d0d4a +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..1c8b0738e2ee30f6ecff0f0f30d32f814c804236 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/trainer_state.json @@ -0,0 +1,588 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 4.933333333333334, + "eval_steps": 10, + "global_step": 370, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.3883018493652344, + "learning_rate": 3.674074074074074e-05, + "loss": 1.3245, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.9867770671844482, + "eval_runtime": 43.8496, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 6.06284663488512e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-370/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..2698f92d38573ba4ff8347aab0eaf4d3aaf04330 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:881e803ba073f1fd5201b5930d566600df8c164ca0db0f4a8c4d0861280d20ed +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..44d697a6d2b31b324c0fc41af2f30a6f647bcf33 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:87e8b3aee1533f991185e1bd3cf55ce4bbd27c05bf6571d7ae457c7e45fe149c +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..f3b952e81c9ed8c37528c0b9d4c13811ac0b62d3 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d5ce5744fa32738c65fe7785ec589c49d96370233c9386567c3f06dceedb5f2c +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..ea6d1fe07e7a7f3419271a4398f4d565fac56daf --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:28ad45095a470f29f504444b984a4733314b535844dcf63615c975f42bc45645 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..67fbb803685a606888a5f978442a26130975195c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/trainer_state.json @@ -0,0 +1,603 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 5.066666666666666, + "eval_steps": 10, + "global_step": 380, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.3883018493652344, + "learning_rate": 3.674074074074074e-05, + "loss": 1.3245, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.9867770671844482, + "eval_runtime": 43.8496, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 2.347261667251587, + "learning_rate": 3.555555555555555e-05, + "loss": 1.2478, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 2.0367517471313477, + "eval_runtime": 43.8491, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 380 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 6.22670735474688e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-380/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..9c0b5bb055dde81801ce0fd291c2c5140cf7fc0d --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f050f686f61ad0c0a6169015c681919283d00fba4b45bff48000e7d458f4cab9 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..428ef7504a5aaf249629eb7820c28c51fd831f37 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:268425c4911ed96f103d5de971cb659b7cd49b5211e608c46853596729f21e99 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..b458d8885e612e71d79c420d6ca3a40dcdcf7fd8 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f47a6a8940dea009f3b7ce239248233dd458275df17acc4fa8ff99eb346e8979 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..a3167c6ee90dbb0999e6231a45db04c9a87f2438 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:530ba9ced1f0c6356984abaadb7284d453a1e0d0e6b99e19b8577f2961322824 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..2ff777fa8204ed5d3f413255fb4bcaf135a5664d --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/trainer_state.json @@ -0,0 +1,618 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 5.2, + "eval_steps": 10, + "global_step": 390, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.3883018493652344, + "learning_rate": 3.674074074074074e-05, + "loss": 1.3245, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.9867770671844482, + "eval_runtime": 43.8496, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 2.347261667251587, + "learning_rate": 3.555555555555555e-05, + "loss": 1.2478, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 2.0367517471313477, + "eval_runtime": 43.8491, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.6627748012542725, + "learning_rate": 3.437037037037037e-05, + "loss": 1.1202, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 2.103827714920044, + "eval_runtime": 43.8256, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 390 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 6.39056807460864e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-390/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..58a0f6c9767a6b90785b19ced1d17c02c1484c86 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f1bb80b685f69aff990a220bf2b5c85b281a7ee0eb5cc23974ecf6614759ea56 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..097c94127998301160428bbd1a6cf0e70030d085 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:439fc7a3aa280bebaac849857788a7ce17e46cc0984866a7a528ee96899948ee +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..cc0cb9030af17e56f3ab00fc0ad6850b4636069d --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b5fde33a4ff115b0a519c0ef179183e0540c837c91cce3dba97312fa8e725570 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..58deb56e794147e34aeb0fe0ef2b4104cec38f91 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fd6339cffd371f25895fb3ae7c3bd2d834564d5a3687aa67acf7e2d38139a85d +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..dfb366e55629bb7891939cf83968e75580a9a454 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/trainer_state.json @@ -0,0 +1,93 @@ +{ + "best_metric": 1.773160457611084, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40", + "epoch": 0.5333333333333333, + "eval_steps": 10, + "global_step": 40, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 6554428794470400.0, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-40/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..03dcc6d7bb075315e9c709f2bdeb198d700607dc --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7df4a3fd933ffd2f6ebf708e34a3dcf736a219b036e91cc3b2df93f0f61106aa +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..d9a4c45e69ac2f9c4b66e96de0c7a4d5ed7f0ef1 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f8c355335f58d4cc9f8d2aefccaa53dc88b1c693cc35714f6e14cddb8ea86789 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..d06e3c475517e0d14c13a6ccad84a3f20110949a --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:96f529f9856ab8a411ac6b8078e33cfc18c0159c4947cd8cac8e1238fc1754c7 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..c89c5b2f6cded5d6811637afffabb012ab4cce08 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b50041001a4c0948a9bbe22413d7173fb07b6c43c8e9fe09c797f707e3f9df6b +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..b726cc7b6e48be956eb8ef649574723c09bea586 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/trainer_state.json @@ -0,0 +1,633 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 5.333333333333333, + "eval_steps": 10, + "global_step": 400, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.3883018493652344, + "learning_rate": 3.674074074074074e-05, + "loss": 1.3245, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.9867770671844482, + "eval_runtime": 43.8496, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 2.347261667251587, + "learning_rate": 3.555555555555555e-05, + "loss": 1.2478, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 2.0367517471313477, + "eval_runtime": 43.8491, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.6627748012542725, + "learning_rate": 3.437037037037037e-05, + "loss": 1.1202, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 2.103827714920044, + "eval_runtime": 43.8256, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.5760111808776855, + "learning_rate": 3.318518518518519e-05, + "loss": 1.0481, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 2.075502872467041, + "eval_runtime": 43.8158, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 400 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 6.5544287944704e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-400/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..6d6461317e50ec167f48a61e8ff816118ccc9a74 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4867c0de45da672d9381dcbfdccfd5afab011ba01e6b472f5d913073835877ad +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..cb7b71193114e3aabded1aa7d2f2b77f43f4c075 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2f3313176acaf3927176fd6a7b9c72417a4db9429eff451e772b1d7c5ba7675c +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..090a1de878697aa3e6255ed23ff26ce6e561a9fa --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2cab01f3c0a9d66cf16eec91d8aebbfd533628e45bdb849b4c3e4ad317f15270 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..a8aad2477f1bc0cb390431f140511318b1528bfd --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cccde3832c2b4ea63ec7404677790f5321d724832eb0e928329bc1aab3b98de6 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..8558bbd89f125c4242f626d2d6c852f703a90cfc --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/trainer_state.json @@ -0,0 +1,648 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 5.466666666666667, + "eval_steps": 10, + "global_step": 410, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.3883018493652344, + "learning_rate": 3.674074074074074e-05, + "loss": 1.3245, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.9867770671844482, + "eval_runtime": 43.8496, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 2.347261667251587, + "learning_rate": 3.555555555555555e-05, + "loss": 1.2478, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 2.0367517471313477, + "eval_runtime": 43.8491, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.6627748012542725, + "learning_rate": 3.437037037037037e-05, + "loss": 1.1202, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 2.103827714920044, + "eval_runtime": 43.8256, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.5760111808776855, + "learning_rate": 3.318518518518519e-05, + "loss": 1.0481, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 2.075502872467041, + "eval_runtime": 43.8158, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 3.0876505374908447, + "learning_rate": 3.211851851851852e-05, + "loss": 1.1515, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 2.0795600414276123, + "eval_runtime": 43.8222, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 410 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 6.71828951433216e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-410/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..b5b82ab3e109ceda5b92c9fce6e1ab40068b5039 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fac9344d53061c8bec54170e7ca34ffacdee044110fddd196a6b518dceb7d6e2 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..e6cb5c8dd362dd4f60b6e174bd837d3ca220cced --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:33d571acae636cabd669ae820039f318fe8472518935e9596283b6c7f2100f20 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..7c168ba589ab149907f65c12980a55da76890995 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:02f02c3c7264962c7bbb05c73c2c2f9530a34cf2c29d550cdc787ae19eb6d9bb +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..153a98dd2b8da55b9d59452b03c6974c79d9edfe --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c2b79027e03c03270d6969e59f71110d92681760bafd2277ab12e1ada9945e25 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..050c54449742169288319d7b5bc47912a0c59cde --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/trainer_state.json @@ -0,0 +1,663 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 5.6, + "eval_steps": 10, + "global_step": 420, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.3883018493652344, + "learning_rate": 3.674074074074074e-05, + "loss": 1.3245, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.9867770671844482, + "eval_runtime": 43.8496, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 2.347261667251587, + "learning_rate": 3.555555555555555e-05, + "loss": 1.2478, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 2.0367517471313477, + "eval_runtime": 43.8491, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.6627748012542725, + "learning_rate": 3.437037037037037e-05, + "loss": 1.1202, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 2.103827714920044, + "eval_runtime": 43.8256, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.5760111808776855, + "learning_rate": 3.318518518518519e-05, + "loss": 1.0481, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 2.075502872467041, + "eval_runtime": 43.8158, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 3.0876505374908447, + "learning_rate": 3.211851851851852e-05, + "loss": 1.1515, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 2.0795600414276123, + "eval_runtime": 43.8222, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.520632028579712, + "learning_rate": 3.093333333333334e-05, + "loss": 1.2198, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 2.0835771560668945, + "eval_runtime": 43.8209, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.853, + "step": 420 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 6.88215023419392e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-420/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..1cf6860c3f74dc8d0737fda252e6180635ec935d --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:33825af7678be5704d92244acef0e3a10cce42d30b0ab7511f62872e4675bffe +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..b2b582e1ece3680e2d2ce2fb500ddb1cd91e57e7 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:263457d07251c98e6edd922b444f6a056760cd9909cfb20a28eee061f7676179 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..eb08c850753d158caff59458c0a4d2fa22ad5de8 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5f5c1faf0e9eb010c64f51b35236463635709da903fff7194839666558e862b6 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..84208c5f196b19c57eb937efa826614bbbb0d731 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7338e38814f9f25087cc602e36f3da04984105e2e1289ab7ef2e3552cc8f0f01 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..d65ce3a9739bdf945d8d41bdad73ad011d291d50 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/trainer_state.json @@ -0,0 +1,678 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 5.733333333333333, + "eval_steps": 10, + "global_step": 430, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.3883018493652344, + "learning_rate": 3.674074074074074e-05, + "loss": 1.3245, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.9867770671844482, + "eval_runtime": 43.8496, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 2.347261667251587, + "learning_rate": 3.555555555555555e-05, + "loss": 1.2478, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 2.0367517471313477, + "eval_runtime": 43.8491, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.6627748012542725, + "learning_rate": 3.437037037037037e-05, + "loss": 1.1202, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 2.103827714920044, + "eval_runtime": 43.8256, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.5760111808776855, + "learning_rate": 3.318518518518519e-05, + "loss": 1.0481, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 2.075502872467041, + "eval_runtime": 43.8158, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 3.0876505374908447, + "learning_rate": 3.211851851851852e-05, + "loss": 1.1515, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 2.0795600414276123, + "eval_runtime": 43.8222, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.520632028579712, + "learning_rate": 3.093333333333334e-05, + "loss": 1.2198, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 2.0835771560668945, + "eval_runtime": 43.8209, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.853, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.776945114135742, + "learning_rate": 2.974814814814815e-05, + "loss": 1.1505, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 2.07682204246521, + "eval_runtime": 43.8225, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 430 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 7.04601095405568e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-430/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..062a666bb85710edebc2e5daae6e6689c9a524cc --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8a887b1f6f67a955c5766d7fcfbfbff85d1fb114584522742d4c6f27c8a2104f +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..c35a37622e6671fd60c0e85ce4932e79560ee9ef --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:099e5db28c3057de4047abf7cc96bddc15ec594dbcaa86d492e56134f6575eae +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..5fdc5e50e381540856fecccc6c375074d1aa7b0a --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:54abee51bb88479cda4bf77e85c2a545e7fb3c5e42f56d1baa63f1344dcc0529 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..c1a6c4449818d1929c2f2741b38d73ee22de980a --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:21ef26d2052defa60f2a77e2a62aa2462763fed87bc956b31ae8baf918519768 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..45c2583d33cd1d149abf8c7b50090b0a611b7763 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/trainer_state.json @@ -0,0 +1,693 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 5.866666666666667, + "eval_steps": 10, + "global_step": 440, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.3883018493652344, + "learning_rate": 3.674074074074074e-05, + "loss": 1.3245, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.9867770671844482, + "eval_runtime": 43.8496, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 2.347261667251587, + "learning_rate": 3.555555555555555e-05, + "loss": 1.2478, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 2.0367517471313477, + "eval_runtime": 43.8491, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.6627748012542725, + "learning_rate": 3.437037037037037e-05, + "loss": 1.1202, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 2.103827714920044, + "eval_runtime": 43.8256, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.5760111808776855, + "learning_rate": 3.318518518518519e-05, + "loss": 1.0481, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 2.075502872467041, + "eval_runtime": 43.8158, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 3.0876505374908447, + "learning_rate": 3.211851851851852e-05, + "loss": 1.1515, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 2.0795600414276123, + "eval_runtime": 43.8222, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.520632028579712, + "learning_rate": 3.093333333333334e-05, + "loss": 1.2198, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 2.0835771560668945, + "eval_runtime": 43.8209, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.853, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.776945114135742, + "learning_rate": 2.974814814814815e-05, + "loss": 1.1505, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 2.07682204246521, + "eval_runtime": 43.8225, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 4.1550798416137695, + "learning_rate": 2.8562962962962966e-05, + "loss": 1.1495, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 2.074658155441284, + "eval_runtime": 43.8288, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 440 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 7.20987167391744e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-440/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..12821580ab15bd679d36c399d5354c45838f7d04 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fc51ee9b90db1fb6004e862736b2015ff32febc8f9285f4b1f912903c9a6edb2 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..c08bc8411400cf92f5286591354d461ad0daad04 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:999cd01781c36fd04c73e44d8bcf518f6ff3d684f9d0d712c3ec42a12b1d9f22 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..3e7c44b011328e871a23ca1fea7cc6ea78d70a29 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4cc0a8131f9f14b855b33975c5e795a94be3a332a0f3cf68a9ec3ab6ce73b177 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..3d231a40f14c348f3b5b2d7d5e332cf86c1516cb --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:73db0bcad7cbe68d6ff70345c93b71c9aa35aa3d7d75f30e67b9806d7161141e +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..e2063a3ce510d4f83a829d64673db4cbdae8e895 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/trainer_state.json @@ -0,0 +1,708 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 6.0, + "eval_steps": 10, + "global_step": 450, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.3883018493652344, + "learning_rate": 3.674074074074074e-05, + "loss": 1.3245, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.9867770671844482, + "eval_runtime": 43.8496, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 2.347261667251587, + "learning_rate": 3.555555555555555e-05, + "loss": 1.2478, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 2.0367517471313477, + "eval_runtime": 43.8491, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.6627748012542725, + "learning_rate": 3.437037037037037e-05, + "loss": 1.1202, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 2.103827714920044, + "eval_runtime": 43.8256, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.5760111808776855, + "learning_rate": 3.318518518518519e-05, + "loss": 1.0481, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 2.075502872467041, + "eval_runtime": 43.8158, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 3.0876505374908447, + "learning_rate": 3.211851851851852e-05, + "loss": 1.1515, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 2.0795600414276123, + "eval_runtime": 43.8222, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.520632028579712, + "learning_rate": 3.093333333333334e-05, + "loss": 1.2198, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 2.0835771560668945, + "eval_runtime": 43.8209, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.853, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.776945114135742, + "learning_rate": 2.974814814814815e-05, + "loss": 1.1505, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 2.07682204246521, + "eval_runtime": 43.8225, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 4.1550798416137695, + "learning_rate": 2.8562962962962966e-05, + "loss": 1.1495, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 2.074658155441284, + "eval_runtime": 43.8288, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 4.296061038970947, + "learning_rate": 2.737777777777778e-05, + "loss": 1.1867, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 2.0737810134887695, + "eval_runtime": 43.8279, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 450 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 7.3737323937792e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-450/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..f8d4ee2b6a07c195cea8bd7bd08e481689fa4ff6 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f54c7dfdfd95e7609671c530914905bf4530cd767e863eb00fa7fbc870f556db +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..5018cd2e9029b8e8546ad9931d33bd1168b2c64d --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:51f80b68f3adfe37ffd9f37569ad95ef06d5d2a3d5393ae98f9b3fe949c33215 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..82f7415495fcd1c3ffb5dae79c8c3a4c2269faa6 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9a6424cc1a4d391795fbea6a94823363dca21ce0e7ec6c433e8cb5b0aca0060f +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..f8e13b071b99f821ef5c5fe852e346d29c41665d --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cf0b498f1d599df6d7a0071a80aa54cc9365ae94954efa11a5a6479b556aa4b0 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..acd59e9d036d008cc3cfaf1b1eddff7ab764b2e7 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/trainer_state.json @@ -0,0 +1,723 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 6.133333333333334, + "eval_steps": 10, + "global_step": 460, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.3883018493652344, + "learning_rate": 3.674074074074074e-05, + "loss": 1.3245, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.9867770671844482, + "eval_runtime": 43.8496, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 2.347261667251587, + "learning_rate": 3.555555555555555e-05, + "loss": 1.2478, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 2.0367517471313477, + "eval_runtime": 43.8491, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.6627748012542725, + "learning_rate": 3.437037037037037e-05, + "loss": 1.1202, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 2.103827714920044, + "eval_runtime": 43.8256, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.5760111808776855, + "learning_rate": 3.318518518518519e-05, + "loss": 1.0481, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 2.075502872467041, + "eval_runtime": 43.8158, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 3.0876505374908447, + "learning_rate": 3.211851851851852e-05, + "loss": 1.1515, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 2.0795600414276123, + "eval_runtime": 43.8222, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.520632028579712, + "learning_rate": 3.093333333333334e-05, + "loss": 1.2198, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 2.0835771560668945, + "eval_runtime": 43.8209, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.853, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.776945114135742, + "learning_rate": 2.974814814814815e-05, + "loss": 1.1505, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 2.07682204246521, + "eval_runtime": 43.8225, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 4.1550798416137695, + "learning_rate": 2.8562962962962966e-05, + "loss": 1.1495, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 2.074658155441284, + "eval_runtime": 43.8288, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 4.296061038970947, + "learning_rate": 2.737777777777778e-05, + "loss": 1.1867, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 2.0737810134887695, + "eval_runtime": 43.8279, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 3.2808094024658203, + "learning_rate": 2.6192592592592597e-05, + "loss": 1.0536, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 2.1731176376342773, + "eval_runtime": 43.8381, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 460 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 7.53759311364096e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-460/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..46600f76d7bb59d05a294591f204ffca79154d19 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e3749cf5e0cbf83380f0fdb43bd755a1cebbde0f41efa38617a796911631febd +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..f8941f393101657538d97a9ba3f34dcd2e5cb716 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f8845c618dedd261ec1d3b7eb0069794e7f1a19fa9efd6a8b28a9f2241ee1836 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..84ca1f63cf231e2aa1c43b465c46ef11c80bc867 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:03fc4a1860f68759a4d7833f4317681e377d4e71cf91ab1f091da8cd71579d26 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..883b49539018c4d63af06c6d2d24da3c4725d1c0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6c3ece07e7f2dc5ed7052a5507b5afe899910ec36fa39dcdf14492b23751b954 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..eb1ddf7de242ab8253a6de92e3e174ebe9158bbc --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/trainer_state.json @@ -0,0 +1,738 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 6.266666666666667, + "eval_steps": 10, + "global_step": 470, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.3883018493652344, + "learning_rate": 3.674074074074074e-05, + "loss": 1.3245, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.9867770671844482, + "eval_runtime": 43.8496, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 2.347261667251587, + "learning_rate": 3.555555555555555e-05, + "loss": 1.2478, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 2.0367517471313477, + "eval_runtime": 43.8491, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.6627748012542725, + "learning_rate": 3.437037037037037e-05, + "loss": 1.1202, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 2.103827714920044, + "eval_runtime": 43.8256, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.5760111808776855, + "learning_rate": 3.318518518518519e-05, + "loss": 1.0481, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 2.075502872467041, + "eval_runtime": 43.8158, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 3.0876505374908447, + "learning_rate": 3.211851851851852e-05, + "loss": 1.1515, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 2.0795600414276123, + "eval_runtime": 43.8222, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.520632028579712, + "learning_rate": 3.093333333333334e-05, + "loss": 1.2198, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 2.0835771560668945, + "eval_runtime": 43.8209, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.853, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.776945114135742, + "learning_rate": 2.974814814814815e-05, + "loss": 1.1505, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 2.07682204246521, + "eval_runtime": 43.8225, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 4.1550798416137695, + "learning_rate": 2.8562962962962966e-05, + "loss": 1.1495, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 2.074658155441284, + "eval_runtime": 43.8288, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 4.296061038970947, + "learning_rate": 2.737777777777778e-05, + "loss": 1.1867, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 2.0737810134887695, + "eval_runtime": 43.8279, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 3.2808094024658203, + "learning_rate": 2.6192592592592597e-05, + "loss": 1.0536, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 2.1731176376342773, + "eval_runtime": 43.8381, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.035400629043579, + "learning_rate": 2.5007407407407408e-05, + "loss": 1.0091, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 2.1523854732513428, + "eval_runtime": 43.8544, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 470 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 7.70145383350272e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-470/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..52b881ee9c6e8ca77f7df1f4b69de05382e12f2f --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:008c6e879b30e59bf4332abb1a283fe6d8f75befeca66a5289d2616a3b3d8d80 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..0e4baf5465478dba3afc5dd88591eb48f122a4e5 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f8ce7c0af5e6e52e9e7b5fde5168ae5290e438d509d8ef86d7ec5bd5990b1e7e +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..302025be6f88ae472170fe5d230ba39d4ec976df --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:918d6ec8ede8d7a880512e2fc44b16d7c22df85e8b411a004d142edcf446c40d +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..a5742081024877b441baba8877d29e5d9f71ce2c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:63e6ae2e8eee493d6fed2392be3cf35a66bb349de2af58b077e3f04b9f1301d2 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..958a6ba10bbb4bb67cadc118c1556f4b5357089d --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/trainer_state.json @@ -0,0 +1,753 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 6.4, + "eval_steps": 10, + "global_step": 480, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.3883018493652344, + "learning_rate": 3.674074074074074e-05, + "loss": 1.3245, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.9867770671844482, + "eval_runtime": 43.8496, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 2.347261667251587, + "learning_rate": 3.555555555555555e-05, + "loss": 1.2478, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 2.0367517471313477, + "eval_runtime": 43.8491, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.6627748012542725, + "learning_rate": 3.437037037037037e-05, + "loss": 1.1202, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 2.103827714920044, + "eval_runtime": 43.8256, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.5760111808776855, + "learning_rate": 3.318518518518519e-05, + "loss": 1.0481, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 2.075502872467041, + "eval_runtime": 43.8158, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 3.0876505374908447, + "learning_rate": 3.211851851851852e-05, + "loss": 1.1515, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 2.0795600414276123, + "eval_runtime": 43.8222, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.520632028579712, + "learning_rate": 3.093333333333334e-05, + "loss": 1.2198, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 2.0835771560668945, + "eval_runtime": 43.8209, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.853, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.776945114135742, + "learning_rate": 2.974814814814815e-05, + "loss": 1.1505, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 2.07682204246521, + "eval_runtime": 43.8225, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 4.1550798416137695, + "learning_rate": 2.8562962962962966e-05, + "loss": 1.1495, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 2.074658155441284, + "eval_runtime": 43.8288, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 4.296061038970947, + "learning_rate": 2.737777777777778e-05, + "loss": 1.1867, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 2.0737810134887695, + "eval_runtime": 43.8279, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 3.2808094024658203, + "learning_rate": 2.6192592592592597e-05, + "loss": 1.0536, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 2.1731176376342773, + "eval_runtime": 43.8381, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.035400629043579, + "learning_rate": 2.5007407407407408e-05, + "loss": 1.0091, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 2.1523854732513428, + "eval_runtime": 43.8544, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.89253830909729, + "learning_rate": 2.382222222222222e-05, + "loss": 1.0155, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 2.144387722015381, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 480 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 7.86531455336448e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-480/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..fb12e2961355285cd3196f5a652d23d95e73dfa0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6c1cc8bcb219650eefd0d976cb18e47833d1d137e073b6555b6a841ef931a10d +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..232096ce42ced5bebb8373ea309400ab749dec9c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c642fc7c9433c2099f107c539f7c7c4fe308df35eac9ba97a577825105312384 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..031b265de35950a615eacc2c86e46292f552e541 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b56a3ff26dded8216d560cf73ba4817b5973851b78edbbf6aa9d6b515761df8c +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..939a3ce5828bd280143e967a31711c43d53f2892 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:33921182c7f28097fa6639877c9ff4014fbfae9942c1efbd1cdf2f870c3b959a +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..22458b8fa418afadd445b13adee866d75eef23d4 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/trainer_state.json @@ -0,0 +1,768 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 6.533333333333333, + "eval_steps": 10, + "global_step": 490, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.3883018493652344, + "learning_rate": 3.674074074074074e-05, + "loss": 1.3245, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.9867770671844482, + "eval_runtime": 43.8496, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 2.347261667251587, + "learning_rate": 3.555555555555555e-05, + "loss": 1.2478, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 2.0367517471313477, + "eval_runtime": 43.8491, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.6627748012542725, + "learning_rate": 3.437037037037037e-05, + "loss": 1.1202, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 2.103827714920044, + "eval_runtime": 43.8256, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.5760111808776855, + "learning_rate": 3.318518518518519e-05, + "loss": 1.0481, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 2.075502872467041, + "eval_runtime": 43.8158, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 3.0876505374908447, + "learning_rate": 3.211851851851852e-05, + "loss": 1.1515, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 2.0795600414276123, + "eval_runtime": 43.8222, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.520632028579712, + "learning_rate": 3.093333333333334e-05, + "loss": 1.2198, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 2.0835771560668945, + "eval_runtime": 43.8209, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.853, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.776945114135742, + "learning_rate": 2.974814814814815e-05, + "loss": 1.1505, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 2.07682204246521, + "eval_runtime": 43.8225, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 4.1550798416137695, + "learning_rate": 2.8562962962962966e-05, + "loss": 1.1495, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 2.074658155441284, + "eval_runtime": 43.8288, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 4.296061038970947, + "learning_rate": 2.737777777777778e-05, + "loss": 1.1867, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 2.0737810134887695, + "eval_runtime": 43.8279, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 3.2808094024658203, + "learning_rate": 2.6192592592592597e-05, + "loss": 1.0536, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 2.1731176376342773, + "eval_runtime": 43.8381, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.035400629043579, + "learning_rate": 2.5007407407407408e-05, + "loss": 1.0091, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 2.1523854732513428, + "eval_runtime": 43.8544, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.89253830909729, + "learning_rate": 2.382222222222222e-05, + "loss": 1.0155, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 2.144387722015381, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 3.1794967651367188, + "learning_rate": 2.263703703703704e-05, + "loss": 1.0635, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 2.1576201915740967, + "eval_runtime": 43.8508, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 490 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 8.02917527322624e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-490/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..d207aebdbaa929cff17c8de9c2ecb25b802fc42b --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:bd7aec9bc87268da5626bde871c958b4e8af21873a08c9e2df51a5a559e128bd +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..15f541d10d297e50610eaff9042669d6724b28d3 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8c3bcc737668edb42353aa48782660aa074bab2012d9f9f4dad7a02cb443f332 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..c1fc54eb4786e9f15244e8e4274b14688b87da5d --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7062fa0264c6fb17100531852b46c235ce631a6626d5e19749a65ba8723532c0 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..398b79131d3cae7675a18194f162b0313d007af9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c9df94c5160293e4f8eaf0f0721da584cb55f7937f6d36d10f25a07c38254f58 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..9655df8da03b7d5c4966f22de2226d866b9ae32e --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/trainer_state.json @@ -0,0 +1,108 @@ +{ + "best_metric": 1.7716821432113647, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50", + "epoch": 0.6666666666666666, + "eval_steps": 10, + "global_step": 50, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 8193035993088000.0, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-50/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..7048b4a80b90d2b0b1dc81801461060bebf56bc0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5608fa06baadcd2764c62f18752f6c0217816cf27d137cdd6edae562024691a8 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..cc3d343d0cc7d4a100283eb7dd626297134daa4d --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c98d01c85e9f2ffd0619f1ad41ef1c0ff03bbabb9f5efdf909948c445ed13991 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..96edd96602542afab3935d537c8d1428ce43196b --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:beda198a64f1e6f1db0895ff6a6859c2af4c98fbf9c15d1daa4dcca9c20f50be +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..fbee6cc358a849a2e6ad0190ef14f6bec9c7f0b9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c3b6d08f0efab509fa47adda4cb2617590b48fae148b2b1d7ad9bdd136db33bf +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..2056c2206a05bbf6b873b0bff89ed5fd1f56b40d --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/trainer_state.json @@ -0,0 +1,783 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 6.666666666666667, + "eval_steps": 10, + "global_step": 500, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.3883018493652344, + "learning_rate": 3.674074074074074e-05, + "loss": 1.3245, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.9867770671844482, + "eval_runtime": 43.8496, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 2.347261667251587, + "learning_rate": 3.555555555555555e-05, + "loss": 1.2478, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 2.0367517471313477, + "eval_runtime": 43.8491, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.6627748012542725, + "learning_rate": 3.437037037037037e-05, + "loss": 1.1202, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 2.103827714920044, + "eval_runtime": 43.8256, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.5760111808776855, + "learning_rate": 3.318518518518519e-05, + "loss": 1.0481, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 2.075502872467041, + "eval_runtime": 43.8158, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 3.0876505374908447, + "learning_rate": 3.211851851851852e-05, + "loss": 1.1515, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 2.0795600414276123, + "eval_runtime": 43.8222, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.520632028579712, + "learning_rate": 3.093333333333334e-05, + "loss": 1.2198, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 2.0835771560668945, + "eval_runtime": 43.8209, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.853, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.776945114135742, + "learning_rate": 2.974814814814815e-05, + "loss": 1.1505, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 2.07682204246521, + "eval_runtime": 43.8225, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 4.1550798416137695, + "learning_rate": 2.8562962962962966e-05, + "loss": 1.1495, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 2.074658155441284, + "eval_runtime": 43.8288, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 4.296061038970947, + "learning_rate": 2.737777777777778e-05, + "loss": 1.1867, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 2.0737810134887695, + "eval_runtime": 43.8279, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 3.2808094024658203, + "learning_rate": 2.6192592592592597e-05, + "loss": 1.0536, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 2.1731176376342773, + "eval_runtime": 43.8381, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.035400629043579, + "learning_rate": 2.5007407407407408e-05, + "loss": 1.0091, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 2.1523854732513428, + "eval_runtime": 43.8544, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.89253830909729, + "learning_rate": 2.382222222222222e-05, + "loss": 1.0155, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 2.144387722015381, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 3.1794967651367188, + "learning_rate": 2.263703703703704e-05, + "loss": 1.0635, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 2.1576201915740967, + "eval_runtime": 43.8508, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 3.1008200645446777, + "learning_rate": 2.157037037037037e-05, + "loss": 1.0641, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 2.1437439918518066, + "eval_runtime": 43.8399, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 500 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 8.193035993088e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-500/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..8d26103f9f354dd4989f03ab6c8b00dccd1e18f6 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:858ba343c728e630783edaa1a8d8a941a3e920de91fe56b05e3d1fb15c57cc67 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..e814d88889f4120279ea15a31d00488d3d418828 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:1fbb43cdb5b2482ba0c03b8f4db0716cbef69699aa4fe4255b89e8298b4baf46 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..52b85f2bd42c764f793cd9aa8382577ad1b51617 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:156b16fe2af6b1592b431fe36919ba4914ab9e672f318f884f5045be66654277 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..5956ec3e39af3bc3fb9d0a232274a6d261b86294 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2b422cfc4511add4fee84b58d2b77503686e4318ca44e2f3eeb2a6d766a05f46 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..dd54a8ef27373fdc2476c9668740385981d880a6 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/trainer_state.json @@ -0,0 +1,798 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 6.8, + "eval_steps": 10, + "global_step": 510, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.3883018493652344, + "learning_rate": 3.674074074074074e-05, + "loss": 1.3245, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.9867770671844482, + "eval_runtime": 43.8496, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 2.347261667251587, + "learning_rate": 3.555555555555555e-05, + "loss": 1.2478, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 2.0367517471313477, + "eval_runtime": 43.8491, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.6627748012542725, + "learning_rate": 3.437037037037037e-05, + "loss": 1.1202, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 2.103827714920044, + "eval_runtime": 43.8256, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.5760111808776855, + "learning_rate": 3.318518518518519e-05, + "loss": 1.0481, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 2.075502872467041, + "eval_runtime": 43.8158, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 3.0876505374908447, + "learning_rate": 3.211851851851852e-05, + "loss": 1.1515, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 2.0795600414276123, + "eval_runtime": 43.8222, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.520632028579712, + "learning_rate": 3.093333333333334e-05, + "loss": 1.2198, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 2.0835771560668945, + "eval_runtime": 43.8209, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.853, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.776945114135742, + "learning_rate": 2.974814814814815e-05, + "loss": 1.1505, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 2.07682204246521, + "eval_runtime": 43.8225, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 4.1550798416137695, + "learning_rate": 2.8562962962962966e-05, + "loss": 1.1495, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 2.074658155441284, + "eval_runtime": 43.8288, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 4.296061038970947, + "learning_rate": 2.737777777777778e-05, + "loss": 1.1867, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 2.0737810134887695, + "eval_runtime": 43.8279, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 3.2808094024658203, + "learning_rate": 2.6192592592592597e-05, + "loss": 1.0536, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 2.1731176376342773, + "eval_runtime": 43.8381, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.035400629043579, + "learning_rate": 2.5007407407407408e-05, + "loss": 1.0091, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 2.1523854732513428, + "eval_runtime": 43.8544, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.89253830909729, + "learning_rate": 2.382222222222222e-05, + "loss": 1.0155, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 2.144387722015381, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 3.1794967651367188, + "learning_rate": 2.263703703703704e-05, + "loss": 1.0635, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 2.1576201915740967, + "eval_runtime": 43.8508, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 3.1008200645446777, + "learning_rate": 2.157037037037037e-05, + "loss": 1.0641, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 2.1437439918518066, + "eval_runtime": 43.8399, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.045117139816284, + "learning_rate": 2.038518518518519e-05, + "loss": 1.0651, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 2.149211883544922, + "eval_runtime": 43.8417, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 510 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 8.35689671294976e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-510/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..9eeb7d19706899b50ba3d0f2e737f620ea1bee41 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:82dd64054e8d0248c68b2ff06fcb3742b6066090b71ed7ef8acdc923c87b5a3c +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..e9d96b22067229d1e4c0bcf70e06b603911bcf50 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:47f7fc5ec417a0b94ef847daec66631bb57c91540e1ca520142730342797d104 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..736afdcce42e3e1d5dec3aedeed239bc0b63975c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ca29f15bc2264125f00923607dbea007ec921af3e528271a2bb77db5cd4d2b66 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..18564260f7c8ec509402d5441c373a7cdf17ffbb --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:daf8646637927378e3142ba5c3e76b6198b2f9e82397a25747c7a71154c155b6 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..94c658f4d98f26e83a28c2f307baf0ef69f4aae9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/trainer_state.json @@ -0,0 +1,813 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 6.933333333333334, + "eval_steps": 10, + "global_step": 520, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.3883018493652344, + "learning_rate": 3.674074074074074e-05, + "loss": 1.3245, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.9867770671844482, + "eval_runtime": 43.8496, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 2.347261667251587, + "learning_rate": 3.555555555555555e-05, + "loss": 1.2478, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 2.0367517471313477, + "eval_runtime": 43.8491, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.6627748012542725, + "learning_rate": 3.437037037037037e-05, + "loss": 1.1202, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 2.103827714920044, + "eval_runtime": 43.8256, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.5760111808776855, + "learning_rate": 3.318518518518519e-05, + "loss": 1.0481, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 2.075502872467041, + "eval_runtime": 43.8158, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 3.0876505374908447, + "learning_rate": 3.211851851851852e-05, + "loss": 1.1515, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 2.0795600414276123, + "eval_runtime": 43.8222, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.520632028579712, + "learning_rate": 3.093333333333334e-05, + "loss": 1.2198, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 2.0835771560668945, + "eval_runtime": 43.8209, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.853, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.776945114135742, + "learning_rate": 2.974814814814815e-05, + "loss": 1.1505, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 2.07682204246521, + "eval_runtime": 43.8225, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 4.1550798416137695, + "learning_rate": 2.8562962962962966e-05, + "loss": 1.1495, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 2.074658155441284, + "eval_runtime": 43.8288, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 4.296061038970947, + "learning_rate": 2.737777777777778e-05, + "loss": 1.1867, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 2.0737810134887695, + "eval_runtime": 43.8279, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 3.2808094024658203, + "learning_rate": 2.6192592592592597e-05, + "loss": 1.0536, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 2.1731176376342773, + "eval_runtime": 43.8381, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.035400629043579, + "learning_rate": 2.5007407407407408e-05, + "loss": 1.0091, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 2.1523854732513428, + "eval_runtime": 43.8544, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.89253830909729, + "learning_rate": 2.382222222222222e-05, + "loss": 1.0155, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 2.144387722015381, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 3.1794967651367188, + "learning_rate": 2.263703703703704e-05, + "loss": 1.0635, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 2.1576201915740967, + "eval_runtime": 43.8508, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 3.1008200645446777, + "learning_rate": 2.157037037037037e-05, + "loss": 1.0641, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 2.1437439918518066, + "eval_runtime": 43.8399, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.045117139816284, + "learning_rate": 2.038518518518519e-05, + "loss": 1.0651, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 2.149211883544922, + "eval_runtime": 43.8417, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 3.004579782485962, + "learning_rate": 1.9200000000000003e-05, + "loss": 1.1477, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 2.15095853805542, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 520 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 8.52075743281152e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-520/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..e253e311b3f45b46399675f34e94af16410e33ee --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:7f66ad78ce5ec8fceb45c44eb36f5623c802d272d06051a8cfb9616ca894a8e6 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..b6b732984415ad6b9aa9b3a8eae97438ed8f2822 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:24815a833058c7ac8eb5316262502b04dc29b0d26c39be632b1a68fe1257d2df +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..b0413aa128dc89fb63c7a74242ac1a6da3ecf5bf --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e9436217a6dd3838565d7b9845d97ff2e933eb514cc6ac99465ebc3448de3312 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..82c8385b0d588958c4972425bb801dcdba96fbea --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:07dceafe0645a9c4a3dc07f8de1e4122b78a7b1e3908687f72a6dfaf3de882a9 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..a677bbebe5f3cfe4c296a5f4a7a9fd1ff361a603 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/trainer_state.json @@ -0,0 +1,828 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 7.066666666666666, + "eval_steps": 10, + "global_step": 530, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.3883018493652344, + "learning_rate": 3.674074074074074e-05, + "loss": 1.3245, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.9867770671844482, + "eval_runtime": 43.8496, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 2.347261667251587, + "learning_rate": 3.555555555555555e-05, + "loss": 1.2478, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 2.0367517471313477, + "eval_runtime": 43.8491, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.6627748012542725, + "learning_rate": 3.437037037037037e-05, + "loss": 1.1202, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 2.103827714920044, + "eval_runtime": 43.8256, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.5760111808776855, + "learning_rate": 3.318518518518519e-05, + "loss": 1.0481, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 2.075502872467041, + "eval_runtime": 43.8158, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 3.0876505374908447, + "learning_rate": 3.211851851851852e-05, + "loss": 1.1515, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 2.0795600414276123, + "eval_runtime": 43.8222, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.520632028579712, + "learning_rate": 3.093333333333334e-05, + "loss": 1.2198, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 2.0835771560668945, + "eval_runtime": 43.8209, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.853, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.776945114135742, + "learning_rate": 2.974814814814815e-05, + "loss": 1.1505, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 2.07682204246521, + "eval_runtime": 43.8225, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 4.1550798416137695, + "learning_rate": 2.8562962962962966e-05, + "loss": 1.1495, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 2.074658155441284, + "eval_runtime": 43.8288, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 4.296061038970947, + "learning_rate": 2.737777777777778e-05, + "loss": 1.1867, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 2.0737810134887695, + "eval_runtime": 43.8279, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 3.2808094024658203, + "learning_rate": 2.6192592592592597e-05, + "loss": 1.0536, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 2.1731176376342773, + "eval_runtime": 43.8381, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.035400629043579, + "learning_rate": 2.5007407407407408e-05, + "loss": 1.0091, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 2.1523854732513428, + "eval_runtime": 43.8544, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.89253830909729, + "learning_rate": 2.382222222222222e-05, + "loss": 1.0155, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 2.144387722015381, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 3.1794967651367188, + "learning_rate": 2.263703703703704e-05, + "loss": 1.0635, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 2.1576201915740967, + "eval_runtime": 43.8508, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 3.1008200645446777, + "learning_rate": 2.157037037037037e-05, + "loss": 1.0641, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 2.1437439918518066, + "eval_runtime": 43.8399, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.045117139816284, + "learning_rate": 2.038518518518519e-05, + "loss": 1.0651, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 2.149211883544922, + "eval_runtime": 43.8417, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 3.004579782485962, + "learning_rate": 1.9200000000000003e-05, + "loss": 1.1477, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 2.15095853805542, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 520 + }, + { + "epoch": 7.066666666666666, + "grad_norm": 3.0588927268981934, + "learning_rate": 1.8014814814814817e-05, + "loss": 1.1034, + "step": 530 + }, + { + "epoch": 7.066666666666666, + "eval_loss": 2.1752231121063232, + "eval_runtime": 43.8223, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 530 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 8.68461815267328e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-530/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..4a382d73d10d1e43aaaa63a9085439ff29cb37a7 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:65269a182d907db1a281c91ce9bb41a055bcce4f15135e82b36bdf077d6ba220 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..a2f382cd2ea1cc1088e40eb039c69501700dbc4d --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d352f856932ac4a4011252fcce29497cbb80f70f353fa772abb6720c9118cde8 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..8d48caf21e655a01d7675a2b465c934cea676943 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:816bfad4f86e01da7fe3bd5bf7d10c902cf135a5b5fec9e0170158290fe5828c +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..01bccc59576f268878581186ba1465c225b58a4d --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:77b038e9d134ab7ee64cb0890175514c4954ad3a74dbbfaf69c7faa6673c7f7e +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..9801b77f15f92973c86e0cc0bfc0d6f3b1b1e532 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/trainer_state.json @@ -0,0 +1,843 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 7.2, + "eval_steps": 10, + "global_step": 540, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.3883018493652344, + "learning_rate": 3.674074074074074e-05, + "loss": 1.3245, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.9867770671844482, + "eval_runtime": 43.8496, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 2.347261667251587, + "learning_rate": 3.555555555555555e-05, + "loss": 1.2478, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 2.0367517471313477, + "eval_runtime": 43.8491, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.6627748012542725, + "learning_rate": 3.437037037037037e-05, + "loss": 1.1202, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 2.103827714920044, + "eval_runtime": 43.8256, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.5760111808776855, + "learning_rate": 3.318518518518519e-05, + "loss": 1.0481, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 2.075502872467041, + "eval_runtime": 43.8158, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 3.0876505374908447, + "learning_rate": 3.211851851851852e-05, + "loss": 1.1515, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 2.0795600414276123, + "eval_runtime": 43.8222, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.520632028579712, + "learning_rate": 3.093333333333334e-05, + "loss": 1.2198, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 2.0835771560668945, + "eval_runtime": 43.8209, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.853, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.776945114135742, + "learning_rate": 2.974814814814815e-05, + "loss": 1.1505, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 2.07682204246521, + "eval_runtime": 43.8225, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 4.1550798416137695, + "learning_rate": 2.8562962962962966e-05, + "loss": 1.1495, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 2.074658155441284, + "eval_runtime": 43.8288, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 4.296061038970947, + "learning_rate": 2.737777777777778e-05, + "loss": 1.1867, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 2.0737810134887695, + "eval_runtime": 43.8279, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 3.2808094024658203, + "learning_rate": 2.6192592592592597e-05, + "loss": 1.0536, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 2.1731176376342773, + "eval_runtime": 43.8381, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.035400629043579, + "learning_rate": 2.5007407407407408e-05, + "loss": 1.0091, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 2.1523854732513428, + "eval_runtime": 43.8544, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.89253830909729, + "learning_rate": 2.382222222222222e-05, + "loss": 1.0155, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 2.144387722015381, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 3.1794967651367188, + "learning_rate": 2.263703703703704e-05, + "loss": 1.0635, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 2.1576201915740967, + "eval_runtime": 43.8508, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 3.1008200645446777, + "learning_rate": 2.157037037037037e-05, + "loss": 1.0641, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 2.1437439918518066, + "eval_runtime": 43.8399, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.045117139816284, + "learning_rate": 2.038518518518519e-05, + "loss": 1.0651, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 2.149211883544922, + "eval_runtime": 43.8417, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 3.004579782485962, + "learning_rate": 1.9200000000000003e-05, + "loss": 1.1477, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 2.15095853805542, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 520 + }, + { + "epoch": 7.066666666666666, + "grad_norm": 3.0588927268981934, + "learning_rate": 1.8014814814814817e-05, + "loss": 1.1034, + "step": 530 + }, + { + "epoch": 7.066666666666666, + "eval_loss": 2.1752231121063232, + "eval_runtime": 43.8223, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 530 + }, + { + "epoch": 7.2, + "grad_norm": 3.6574132442474365, + "learning_rate": 1.682962962962963e-05, + "loss": 0.9523, + "step": 540 + }, + { + "epoch": 7.2, + "eval_loss": 2.224954128265381, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 540 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 8.84847887253504e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-540/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..4fcb20a112588791b1baa40bf3e01cab9b22c540 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0b2487be93d7a2d9cb8f331ef7a17cdb540a005092088089c96897a20ae9540d +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..86bc2324ec37022a75a67ebaf5b3fd2f066aca08 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:04bc887f4ff0e91e9603e0469352eea5e7a84a84ad8319d20a2306488681e8f4 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..9dc1ec111f2a6f7fbe8d878013e83df65b5f618a --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5b6faa8c50c89ce52c86274c8c795afb3f00524e7aef4544572df4b5b6b12c6d +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..086577d758a08f4e03a2097f0afd795a311ba097 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2b71c7a1d4d8c23c15903d21d92890023edaf3fe70a00d3310ebb4c311096c83 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..c1d8c946b0d175a974c9f4443bc41253db4f7714 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/trainer_state.json @@ -0,0 +1,858 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 7.333333333333333, + "eval_steps": 10, + "global_step": 550, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.3883018493652344, + "learning_rate": 3.674074074074074e-05, + "loss": 1.3245, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.9867770671844482, + "eval_runtime": 43.8496, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 2.347261667251587, + "learning_rate": 3.555555555555555e-05, + "loss": 1.2478, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 2.0367517471313477, + "eval_runtime": 43.8491, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.6627748012542725, + "learning_rate": 3.437037037037037e-05, + "loss": 1.1202, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 2.103827714920044, + "eval_runtime": 43.8256, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.5760111808776855, + "learning_rate": 3.318518518518519e-05, + "loss": 1.0481, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 2.075502872467041, + "eval_runtime": 43.8158, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 3.0876505374908447, + "learning_rate": 3.211851851851852e-05, + "loss": 1.1515, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 2.0795600414276123, + "eval_runtime": 43.8222, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.520632028579712, + "learning_rate": 3.093333333333334e-05, + "loss": 1.2198, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 2.0835771560668945, + "eval_runtime": 43.8209, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.853, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.776945114135742, + "learning_rate": 2.974814814814815e-05, + "loss": 1.1505, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 2.07682204246521, + "eval_runtime": 43.8225, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 4.1550798416137695, + "learning_rate": 2.8562962962962966e-05, + "loss": 1.1495, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 2.074658155441284, + "eval_runtime": 43.8288, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 4.296061038970947, + "learning_rate": 2.737777777777778e-05, + "loss": 1.1867, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 2.0737810134887695, + "eval_runtime": 43.8279, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 3.2808094024658203, + "learning_rate": 2.6192592592592597e-05, + "loss": 1.0536, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 2.1731176376342773, + "eval_runtime": 43.8381, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.035400629043579, + "learning_rate": 2.5007407407407408e-05, + "loss": 1.0091, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 2.1523854732513428, + "eval_runtime": 43.8544, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.89253830909729, + "learning_rate": 2.382222222222222e-05, + "loss": 1.0155, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 2.144387722015381, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 3.1794967651367188, + "learning_rate": 2.263703703703704e-05, + "loss": 1.0635, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 2.1576201915740967, + "eval_runtime": 43.8508, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 3.1008200645446777, + "learning_rate": 2.157037037037037e-05, + "loss": 1.0641, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 2.1437439918518066, + "eval_runtime": 43.8399, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.045117139816284, + "learning_rate": 2.038518518518519e-05, + "loss": 1.0651, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 2.149211883544922, + "eval_runtime": 43.8417, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 3.004579782485962, + "learning_rate": 1.9200000000000003e-05, + "loss": 1.1477, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 2.15095853805542, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 520 + }, + { + "epoch": 7.066666666666666, + "grad_norm": 3.0588927268981934, + "learning_rate": 1.8014814814814817e-05, + "loss": 1.1034, + "step": 530 + }, + { + "epoch": 7.066666666666666, + "eval_loss": 2.1752231121063232, + "eval_runtime": 43.8223, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 530 + }, + { + "epoch": 7.2, + "grad_norm": 3.6574132442474365, + "learning_rate": 1.682962962962963e-05, + "loss": 0.9523, + "step": 540 + }, + { + "epoch": 7.2, + "eval_loss": 2.224954128265381, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 540 + }, + { + "epoch": 7.333333333333333, + "grad_norm": 2.6502890586853027, + "learning_rate": 1.5644444444444448e-05, + "loss": 0.9977, + "step": 550 + }, + { + "epoch": 7.333333333333333, + "eval_loss": 2.2001636028289795, + "eval_runtime": 43.8171, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 550 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 9.0123395923968e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-550/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..91dff152f96cc8de05a09bf782234b13bf5b37c6 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:87534cdc743f95a5e82b53c721bb05e034569d4d037d1a72c47b5ed320c917df +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..1df4574fa9e06328780a1fe385a69528587d2015 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:98b91f2ddac506969b0587899882a4d33cf0a9f7c2a17b5dc35db983b3807f05 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..75311ff97c8628cb71fe6f6cdca5e9e1127d30b6 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3b6745ab2a92f54dcacb73c3ceec9d54235e5b225134fb7703879ee6185ad897 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..65a0ddaad7e280b4954a92629be4a7dc2dec2561 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ff76422dc146604f6a50e660823573feea0b9519c0669db4d9e31680ec6f059f +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..48281899de79cc49f67a5abd0ddbe7c271d22a47 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/trainer_state.json @@ -0,0 +1,873 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 7.466666666666667, + "eval_steps": 10, + "global_step": 560, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.3883018493652344, + "learning_rate": 3.674074074074074e-05, + "loss": 1.3245, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.9867770671844482, + "eval_runtime": 43.8496, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 2.347261667251587, + "learning_rate": 3.555555555555555e-05, + "loss": 1.2478, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 2.0367517471313477, + "eval_runtime": 43.8491, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.6627748012542725, + "learning_rate": 3.437037037037037e-05, + "loss": 1.1202, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 2.103827714920044, + "eval_runtime": 43.8256, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.5760111808776855, + "learning_rate": 3.318518518518519e-05, + "loss": 1.0481, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 2.075502872467041, + "eval_runtime": 43.8158, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 3.0876505374908447, + "learning_rate": 3.211851851851852e-05, + "loss": 1.1515, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 2.0795600414276123, + "eval_runtime": 43.8222, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.520632028579712, + "learning_rate": 3.093333333333334e-05, + "loss": 1.2198, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 2.0835771560668945, + "eval_runtime": 43.8209, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.853, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.776945114135742, + "learning_rate": 2.974814814814815e-05, + "loss": 1.1505, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 2.07682204246521, + "eval_runtime": 43.8225, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 4.1550798416137695, + "learning_rate": 2.8562962962962966e-05, + "loss": 1.1495, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 2.074658155441284, + "eval_runtime": 43.8288, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 4.296061038970947, + "learning_rate": 2.737777777777778e-05, + "loss": 1.1867, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 2.0737810134887695, + "eval_runtime": 43.8279, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 3.2808094024658203, + "learning_rate": 2.6192592592592597e-05, + "loss": 1.0536, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 2.1731176376342773, + "eval_runtime": 43.8381, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.035400629043579, + "learning_rate": 2.5007407407407408e-05, + "loss": 1.0091, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 2.1523854732513428, + "eval_runtime": 43.8544, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.89253830909729, + "learning_rate": 2.382222222222222e-05, + "loss": 1.0155, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 2.144387722015381, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 3.1794967651367188, + "learning_rate": 2.263703703703704e-05, + "loss": 1.0635, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 2.1576201915740967, + "eval_runtime": 43.8508, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 3.1008200645446777, + "learning_rate": 2.157037037037037e-05, + "loss": 1.0641, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 2.1437439918518066, + "eval_runtime": 43.8399, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.045117139816284, + "learning_rate": 2.038518518518519e-05, + "loss": 1.0651, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 2.149211883544922, + "eval_runtime": 43.8417, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 3.004579782485962, + "learning_rate": 1.9200000000000003e-05, + "loss": 1.1477, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 2.15095853805542, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 520 + }, + { + "epoch": 7.066666666666666, + "grad_norm": 3.0588927268981934, + "learning_rate": 1.8014814814814817e-05, + "loss": 1.1034, + "step": 530 + }, + { + "epoch": 7.066666666666666, + "eval_loss": 2.1752231121063232, + "eval_runtime": 43.8223, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 530 + }, + { + "epoch": 7.2, + "grad_norm": 3.6574132442474365, + "learning_rate": 1.682962962962963e-05, + "loss": 0.9523, + "step": 540 + }, + { + "epoch": 7.2, + "eval_loss": 2.224954128265381, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 540 + }, + { + "epoch": 7.333333333333333, + "grad_norm": 2.6502890586853027, + "learning_rate": 1.5644444444444448e-05, + "loss": 0.9977, + "step": 550 + }, + { + "epoch": 7.333333333333333, + "eval_loss": 2.2001636028289795, + "eval_runtime": 43.8171, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 550 + }, + { + "epoch": 7.466666666666667, + "grad_norm": 3.5509214401245117, + "learning_rate": 1.445925925925926e-05, + "loss": 1.0191, + "step": 560 + }, + { + "epoch": 7.466666666666667, + "eval_loss": 2.206838846206665, + "eval_runtime": 43.8328, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 560 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 9.17620031225856e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-560/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..8338c302b78be584f87ed5a035fbd263d30fa0ae --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:466c25e388c2212f91102891551f91f754e271cc7ebf185435722b478fcd9a38 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..89f4d42e2f40368e26e9ac22da5ed75cd5187fbd --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b7f015d521812cad3de1a222724daff7a1633516a8cd06e1046d534e910ab6d2 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..3ed38f9a78b3dbf6f2e73e5bd68681ac198b1983 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d966d92a47b281ed57ee7f44ee2eaa60a54786f7ca9b7e8829ab8723bc8a5a1d +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..5a6a4b0cbd4610c091980d2e81f14762590d01ca --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3374b2300d9a79acff0b4473325fb9c6125ff89ba7826b9a37f7c4f6e43803c9 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..ba8c87b899a4032eaa459bbe5f8a3449fb3458e5 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/trainer_state.json @@ -0,0 +1,888 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 7.6, + "eval_steps": 10, + "global_step": 570, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.3883018493652344, + "learning_rate": 3.674074074074074e-05, + "loss": 1.3245, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.9867770671844482, + "eval_runtime": 43.8496, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 2.347261667251587, + "learning_rate": 3.555555555555555e-05, + "loss": 1.2478, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 2.0367517471313477, + "eval_runtime": 43.8491, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.6627748012542725, + "learning_rate": 3.437037037037037e-05, + "loss": 1.1202, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 2.103827714920044, + "eval_runtime": 43.8256, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.5760111808776855, + "learning_rate": 3.318518518518519e-05, + "loss": 1.0481, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 2.075502872467041, + "eval_runtime": 43.8158, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 3.0876505374908447, + "learning_rate": 3.211851851851852e-05, + "loss": 1.1515, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 2.0795600414276123, + "eval_runtime": 43.8222, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.520632028579712, + "learning_rate": 3.093333333333334e-05, + "loss": 1.2198, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 2.0835771560668945, + "eval_runtime": 43.8209, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.853, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.776945114135742, + "learning_rate": 2.974814814814815e-05, + "loss": 1.1505, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 2.07682204246521, + "eval_runtime": 43.8225, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 4.1550798416137695, + "learning_rate": 2.8562962962962966e-05, + "loss": 1.1495, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 2.074658155441284, + "eval_runtime": 43.8288, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 4.296061038970947, + "learning_rate": 2.737777777777778e-05, + "loss": 1.1867, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 2.0737810134887695, + "eval_runtime": 43.8279, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 3.2808094024658203, + "learning_rate": 2.6192592592592597e-05, + "loss": 1.0536, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 2.1731176376342773, + "eval_runtime": 43.8381, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.035400629043579, + "learning_rate": 2.5007407407407408e-05, + "loss": 1.0091, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 2.1523854732513428, + "eval_runtime": 43.8544, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.89253830909729, + "learning_rate": 2.382222222222222e-05, + "loss": 1.0155, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 2.144387722015381, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 3.1794967651367188, + "learning_rate": 2.263703703703704e-05, + "loss": 1.0635, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 2.1576201915740967, + "eval_runtime": 43.8508, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 3.1008200645446777, + "learning_rate": 2.157037037037037e-05, + "loss": 1.0641, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 2.1437439918518066, + "eval_runtime": 43.8399, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.045117139816284, + "learning_rate": 2.038518518518519e-05, + "loss": 1.0651, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 2.149211883544922, + "eval_runtime": 43.8417, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 3.004579782485962, + "learning_rate": 1.9200000000000003e-05, + "loss": 1.1477, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 2.15095853805542, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 520 + }, + { + "epoch": 7.066666666666666, + "grad_norm": 3.0588927268981934, + "learning_rate": 1.8014814814814817e-05, + "loss": 1.1034, + "step": 530 + }, + { + "epoch": 7.066666666666666, + "eval_loss": 2.1752231121063232, + "eval_runtime": 43.8223, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 530 + }, + { + "epoch": 7.2, + "grad_norm": 3.6574132442474365, + "learning_rate": 1.682962962962963e-05, + "loss": 0.9523, + "step": 540 + }, + { + "epoch": 7.2, + "eval_loss": 2.224954128265381, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 540 + }, + { + "epoch": 7.333333333333333, + "grad_norm": 2.6502890586853027, + "learning_rate": 1.5644444444444448e-05, + "loss": 0.9977, + "step": 550 + }, + { + "epoch": 7.333333333333333, + "eval_loss": 2.2001636028289795, + "eval_runtime": 43.8171, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 550 + }, + { + "epoch": 7.466666666666667, + "grad_norm": 3.5509214401245117, + "learning_rate": 1.445925925925926e-05, + "loss": 1.0191, + "step": 560 + }, + { + "epoch": 7.466666666666667, + "eval_loss": 2.206838846206665, + "eval_runtime": 43.8328, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 560 + }, + { + "epoch": 7.6, + "grad_norm": 3.4937808513641357, + "learning_rate": 1.3274074074074074e-05, + "loss": 0.9302, + "step": 570 + }, + { + "epoch": 7.6, + "eval_loss": 2.2026851177215576, + "eval_runtime": 43.8326, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 570 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 9.34006103212032e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-570/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..bd7a9706f63aea252ebdfca3b46cf2bbae58c2b5 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2e1cbd7164e97cf9a50c266b1cb11c0a48f0a943b2142cccce589f92e007af6a +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..f2e87e439e37efa1cc8e10942e198d62106c2444 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a04e3860dc27e3305d2cb537b091c4efadbe0d11a13a3719d8560f169da74fb4 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..6f12baaba3ec135e726e0b75dc20ee8cfe8a995d --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:55a6ddc6425602c9554969e2910a1ee66847f95ab8fd86352843e16c6530b2c0 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..bf28af489a48d44ae76afd3ccf19dea4d48ac17e --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:913ed29eb3f59230a2cd25bc3087453a0526e3b77b8e3f9ad03006861c52c3f2 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..73491292c1e9dd01d24a00a2932dc476cd5d4847 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/trainer_state.json @@ -0,0 +1,903 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 7.733333333333333, + "eval_steps": 10, + "global_step": 580, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.3883018493652344, + "learning_rate": 3.674074074074074e-05, + "loss": 1.3245, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.9867770671844482, + "eval_runtime": 43.8496, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 2.347261667251587, + "learning_rate": 3.555555555555555e-05, + "loss": 1.2478, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 2.0367517471313477, + "eval_runtime": 43.8491, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.6627748012542725, + "learning_rate": 3.437037037037037e-05, + "loss": 1.1202, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 2.103827714920044, + "eval_runtime": 43.8256, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.5760111808776855, + "learning_rate": 3.318518518518519e-05, + "loss": 1.0481, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 2.075502872467041, + "eval_runtime": 43.8158, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 3.0876505374908447, + "learning_rate": 3.211851851851852e-05, + "loss": 1.1515, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 2.0795600414276123, + "eval_runtime": 43.8222, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.520632028579712, + "learning_rate": 3.093333333333334e-05, + "loss": 1.2198, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 2.0835771560668945, + "eval_runtime": 43.8209, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.853, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.776945114135742, + "learning_rate": 2.974814814814815e-05, + "loss": 1.1505, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 2.07682204246521, + "eval_runtime": 43.8225, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 4.1550798416137695, + "learning_rate": 2.8562962962962966e-05, + "loss": 1.1495, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 2.074658155441284, + "eval_runtime": 43.8288, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 4.296061038970947, + "learning_rate": 2.737777777777778e-05, + "loss": 1.1867, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 2.0737810134887695, + "eval_runtime": 43.8279, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 3.2808094024658203, + "learning_rate": 2.6192592592592597e-05, + "loss": 1.0536, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 2.1731176376342773, + "eval_runtime": 43.8381, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.035400629043579, + "learning_rate": 2.5007407407407408e-05, + "loss": 1.0091, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 2.1523854732513428, + "eval_runtime": 43.8544, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.89253830909729, + "learning_rate": 2.382222222222222e-05, + "loss": 1.0155, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 2.144387722015381, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 3.1794967651367188, + "learning_rate": 2.263703703703704e-05, + "loss": 1.0635, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 2.1576201915740967, + "eval_runtime": 43.8508, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 3.1008200645446777, + "learning_rate": 2.157037037037037e-05, + "loss": 1.0641, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 2.1437439918518066, + "eval_runtime": 43.8399, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.045117139816284, + "learning_rate": 2.038518518518519e-05, + "loss": 1.0651, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 2.149211883544922, + "eval_runtime": 43.8417, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 3.004579782485962, + "learning_rate": 1.9200000000000003e-05, + "loss": 1.1477, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 2.15095853805542, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 520 + }, + { + "epoch": 7.066666666666666, + "grad_norm": 3.0588927268981934, + "learning_rate": 1.8014814814814817e-05, + "loss": 1.1034, + "step": 530 + }, + { + "epoch": 7.066666666666666, + "eval_loss": 2.1752231121063232, + "eval_runtime": 43.8223, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 530 + }, + { + "epoch": 7.2, + "grad_norm": 3.6574132442474365, + "learning_rate": 1.682962962962963e-05, + "loss": 0.9523, + "step": 540 + }, + { + "epoch": 7.2, + "eval_loss": 2.224954128265381, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 540 + }, + { + "epoch": 7.333333333333333, + "grad_norm": 2.6502890586853027, + "learning_rate": 1.5644444444444448e-05, + "loss": 0.9977, + "step": 550 + }, + { + "epoch": 7.333333333333333, + "eval_loss": 2.2001636028289795, + "eval_runtime": 43.8171, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 550 + }, + { + "epoch": 7.466666666666667, + "grad_norm": 3.5509214401245117, + "learning_rate": 1.445925925925926e-05, + "loss": 1.0191, + "step": 560 + }, + { + "epoch": 7.466666666666667, + "eval_loss": 2.206838846206665, + "eval_runtime": 43.8328, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 560 + }, + { + "epoch": 7.6, + "grad_norm": 3.4937808513641357, + "learning_rate": 1.3274074074074074e-05, + "loss": 0.9302, + "step": 570 + }, + { + "epoch": 7.6, + "eval_loss": 2.2026851177215576, + "eval_runtime": 43.8326, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 570 + }, + { + "epoch": 7.733333333333333, + "grad_norm": 3.4799985885620117, + "learning_rate": 1.208888888888889e-05, + "loss": 1.0328, + "step": 580 + }, + { + "epoch": 7.733333333333333, + "eval_loss": 2.202017068862915, + "eval_runtime": 43.8401, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 580 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 9.50392175198208e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-580/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..5db6d4e33daca7d61363f1a6985920394b14c75f --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4c4c3153bb3062f322de386cbe87388957efae89162ee4ba1cca6ffd5e24ec07 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..4b5a0076c40e44141527a183f24e8e89eec46058 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:80cbb8dc44edf57a220c5ac6a31b7623c96c13ef60017c8f4b9388306cdd430f +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..f2cbe02e4922a4920c0a827f09f6df580967beb0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c5704b322a17ce5b2788c1247543e3ca9edc36d083fd8ecc8ca80d04334c6030 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..0d7bf244a42236b6c036bb05b854765134bfb7dc --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:08319107f1dbff6981178f60c28e841b46f0d4a78bd6eda8598e98a22b1e7553 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..1b43c90254806d42c58ceffc414b81f5974f1bc1 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/trainer_state.json @@ -0,0 +1,918 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 7.866666666666667, + "eval_steps": 10, + "global_step": 590, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.3883018493652344, + "learning_rate": 3.674074074074074e-05, + "loss": 1.3245, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.9867770671844482, + "eval_runtime": 43.8496, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 2.347261667251587, + "learning_rate": 3.555555555555555e-05, + "loss": 1.2478, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 2.0367517471313477, + "eval_runtime": 43.8491, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.6627748012542725, + "learning_rate": 3.437037037037037e-05, + "loss": 1.1202, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 2.103827714920044, + "eval_runtime": 43.8256, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.5760111808776855, + "learning_rate": 3.318518518518519e-05, + "loss": 1.0481, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 2.075502872467041, + "eval_runtime": 43.8158, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 3.0876505374908447, + "learning_rate": 3.211851851851852e-05, + "loss": 1.1515, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 2.0795600414276123, + "eval_runtime": 43.8222, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.520632028579712, + "learning_rate": 3.093333333333334e-05, + "loss": 1.2198, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 2.0835771560668945, + "eval_runtime": 43.8209, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.853, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.776945114135742, + "learning_rate": 2.974814814814815e-05, + "loss": 1.1505, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 2.07682204246521, + "eval_runtime": 43.8225, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 4.1550798416137695, + "learning_rate": 2.8562962962962966e-05, + "loss": 1.1495, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 2.074658155441284, + "eval_runtime": 43.8288, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 4.296061038970947, + "learning_rate": 2.737777777777778e-05, + "loss": 1.1867, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 2.0737810134887695, + "eval_runtime": 43.8279, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 3.2808094024658203, + "learning_rate": 2.6192592592592597e-05, + "loss": 1.0536, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 2.1731176376342773, + "eval_runtime": 43.8381, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.035400629043579, + "learning_rate": 2.5007407407407408e-05, + "loss": 1.0091, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 2.1523854732513428, + "eval_runtime": 43.8544, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.89253830909729, + "learning_rate": 2.382222222222222e-05, + "loss": 1.0155, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 2.144387722015381, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 3.1794967651367188, + "learning_rate": 2.263703703703704e-05, + "loss": 1.0635, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 2.1576201915740967, + "eval_runtime": 43.8508, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 3.1008200645446777, + "learning_rate": 2.157037037037037e-05, + "loss": 1.0641, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 2.1437439918518066, + "eval_runtime": 43.8399, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.045117139816284, + "learning_rate": 2.038518518518519e-05, + "loss": 1.0651, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 2.149211883544922, + "eval_runtime": 43.8417, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 3.004579782485962, + "learning_rate": 1.9200000000000003e-05, + "loss": 1.1477, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 2.15095853805542, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 520 + }, + { + "epoch": 7.066666666666666, + "grad_norm": 3.0588927268981934, + "learning_rate": 1.8014814814814817e-05, + "loss": 1.1034, + "step": 530 + }, + { + "epoch": 7.066666666666666, + "eval_loss": 2.1752231121063232, + "eval_runtime": 43.8223, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 530 + }, + { + "epoch": 7.2, + "grad_norm": 3.6574132442474365, + "learning_rate": 1.682962962962963e-05, + "loss": 0.9523, + "step": 540 + }, + { + "epoch": 7.2, + "eval_loss": 2.224954128265381, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 540 + }, + { + "epoch": 7.333333333333333, + "grad_norm": 2.6502890586853027, + "learning_rate": 1.5644444444444448e-05, + "loss": 0.9977, + "step": 550 + }, + { + "epoch": 7.333333333333333, + "eval_loss": 2.2001636028289795, + "eval_runtime": 43.8171, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 550 + }, + { + "epoch": 7.466666666666667, + "grad_norm": 3.5509214401245117, + "learning_rate": 1.445925925925926e-05, + "loss": 1.0191, + "step": 560 + }, + { + "epoch": 7.466666666666667, + "eval_loss": 2.206838846206665, + "eval_runtime": 43.8328, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 560 + }, + { + "epoch": 7.6, + "grad_norm": 3.4937808513641357, + "learning_rate": 1.3274074074074074e-05, + "loss": 0.9302, + "step": 570 + }, + { + "epoch": 7.6, + "eval_loss": 2.2026851177215576, + "eval_runtime": 43.8326, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 570 + }, + { + "epoch": 7.733333333333333, + "grad_norm": 3.4799985885620117, + "learning_rate": 1.208888888888889e-05, + "loss": 1.0328, + "step": 580 + }, + { + "epoch": 7.733333333333333, + "eval_loss": 2.202017068862915, + "eval_runtime": 43.8401, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 580 + }, + { + "epoch": 7.866666666666667, + "grad_norm": 3.961993932723999, + "learning_rate": 1.0903703703703706e-05, + "loss": 1.0468, + "step": 590 + }, + { + "epoch": 7.866666666666667, + "eval_loss": 2.2057199478149414, + "eval_runtime": 43.8449, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 590 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 9.66778247184384e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-590/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..e380b0bf0a62c5ea7b9e41201c3f7c48ab005841 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ceaf50c2bdda91cc6409dda27d5ddd0ca21ec0812b880312110fc9986c9b58c7 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..333d676d7a4fa3b2f8438fc67c06e6fd17c4af38 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:85c7337b9502cc13157a281b5d78883a84c1f534fe37a7764f824d8ab3e5fce3 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..3d041c10a3af80c2be01488b87e7c23a107acab4 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:224b98cd2a3813f8f156af229101dde99ced2e24294f3d7ad7b1538fdc49c27c +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..84318e6776594d61c8499add435aeacb07e6836d --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e008a2a60ac0146b8fb53c899e87a58b9b9d6047332cda4d0f382eefe44abc3e +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..deb1cd6e2e676de30ae3f2cea77fe493e4ebf760 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/trainer_state.json @@ -0,0 +1,123 @@ +{ + "best_metric": 1.7701489925384521, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60", + "epoch": 0.8, + "eval_steps": 10, + "global_step": 60, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 9831643191705600.0, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-60/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..03b6c192bce370b01635bcda48dc2ead73f2b114 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:3f040b3635b9f1924834892e62bb69cc6778b2c37b73dc4d39f9ed0b8698adf0 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..7de4878631ac9700a77348450f09c5c6c05e6345 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:57fd36712e1baed146da9da8881dfb66476d2320126b7b4a2be72e793712e2af +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..ef40b259bc3233779099c3b8651c2fe0a9d07fa5 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9bbc772ea5a37ab482a5fa0d13a2014584215ee3da6246ff6fe50fb8dafbfb8e +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..175ccc453736170163e5873ef983820f76aa7e55 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e458eadf68727bcab3930cfd1f9cf6d57f08827e77b09bdc58406b68c3e696e3 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..d196c12ebebc531ca55a2ad9a06b4635e3ae3fd6 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/trainer_state.json @@ -0,0 +1,933 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 8.0, + "eval_steps": 10, + "global_step": 600, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.3883018493652344, + "learning_rate": 3.674074074074074e-05, + "loss": 1.3245, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.9867770671844482, + "eval_runtime": 43.8496, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 2.347261667251587, + "learning_rate": 3.555555555555555e-05, + "loss": 1.2478, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 2.0367517471313477, + "eval_runtime": 43.8491, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.6627748012542725, + "learning_rate": 3.437037037037037e-05, + "loss": 1.1202, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 2.103827714920044, + "eval_runtime": 43.8256, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.5760111808776855, + "learning_rate": 3.318518518518519e-05, + "loss": 1.0481, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 2.075502872467041, + "eval_runtime": 43.8158, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 3.0876505374908447, + "learning_rate": 3.211851851851852e-05, + "loss": 1.1515, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 2.0795600414276123, + "eval_runtime": 43.8222, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.520632028579712, + "learning_rate": 3.093333333333334e-05, + "loss": 1.2198, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 2.0835771560668945, + "eval_runtime": 43.8209, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.853, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.776945114135742, + "learning_rate": 2.974814814814815e-05, + "loss": 1.1505, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 2.07682204246521, + "eval_runtime": 43.8225, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 4.1550798416137695, + "learning_rate": 2.8562962962962966e-05, + "loss": 1.1495, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 2.074658155441284, + "eval_runtime": 43.8288, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 4.296061038970947, + "learning_rate": 2.737777777777778e-05, + "loss": 1.1867, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 2.0737810134887695, + "eval_runtime": 43.8279, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 3.2808094024658203, + "learning_rate": 2.6192592592592597e-05, + "loss": 1.0536, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 2.1731176376342773, + "eval_runtime": 43.8381, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.035400629043579, + "learning_rate": 2.5007407407407408e-05, + "loss": 1.0091, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 2.1523854732513428, + "eval_runtime": 43.8544, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.89253830909729, + "learning_rate": 2.382222222222222e-05, + "loss": 1.0155, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 2.144387722015381, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 3.1794967651367188, + "learning_rate": 2.263703703703704e-05, + "loss": 1.0635, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 2.1576201915740967, + "eval_runtime": 43.8508, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 3.1008200645446777, + "learning_rate": 2.157037037037037e-05, + "loss": 1.0641, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 2.1437439918518066, + "eval_runtime": 43.8399, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.045117139816284, + "learning_rate": 2.038518518518519e-05, + "loss": 1.0651, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 2.149211883544922, + "eval_runtime": 43.8417, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 3.004579782485962, + "learning_rate": 1.9200000000000003e-05, + "loss": 1.1477, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 2.15095853805542, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 520 + }, + { + "epoch": 7.066666666666666, + "grad_norm": 3.0588927268981934, + "learning_rate": 1.8014814814814817e-05, + "loss": 1.1034, + "step": 530 + }, + { + "epoch": 7.066666666666666, + "eval_loss": 2.1752231121063232, + "eval_runtime": 43.8223, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 530 + }, + { + "epoch": 7.2, + "grad_norm": 3.6574132442474365, + "learning_rate": 1.682962962962963e-05, + "loss": 0.9523, + "step": 540 + }, + { + "epoch": 7.2, + "eval_loss": 2.224954128265381, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 540 + }, + { + "epoch": 7.333333333333333, + "grad_norm": 2.6502890586853027, + "learning_rate": 1.5644444444444448e-05, + "loss": 0.9977, + "step": 550 + }, + { + "epoch": 7.333333333333333, + "eval_loss": 2.2001636028289795, + "eval_runtime": 43.8171, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 550 + }, + { + "epoch": 7.466666666666667, + "grad_norm": 3.5509214401245117, + "learning_rate": 1.445925925925926e-05, + "loss": 1.0191, + "step": 560 + }, + { + "epoch": 7.466666666666667, + "eval_loss": 2.206838846206665, + "eval_runtime": 43.8328, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 560 + }, + { + "epoch": 7.6, + "grad_norm": 3.4937808513641357, + "learning_rate": 1.3274074074074074e-05, + "loss": 0.9302, + "step": 570 + }, + { + "epoch": 7.6, + "eval_loss": 2.2026851177215576, + "eval_runtime": 43.8326, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 570 + }, + { + "epoch": 7.733333333333333, + "grad_norm": 3.4799985885620117, + "learning_rate": 1.208888888888889e-05, + "loss": 1.0328, + "step": 580 + }, + { + "epoch": 7.733333333333333, + "eval_loss": 2.202017068862915, + "eval_runtime": 43.8401, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 580 + }, + { + "epoch": 7.866666666666667, + "grad_norm": 3.961993932723999, + "learning_rate": 1.0903703703703706e-05, + "loss": 1.0468, + "step": 590 + }, + { + "epoch": 7.866666666666667, + "eval_loss": 2.2057199478149414, + "eval_runtime": 43.8449, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 590 + }, + { + "epoch": 8.0, + "grad_norm": 3.0407888889312744, + "learning_rate": 9.837037037037038e-06, + "loss": 1.0029, + "step": 600 + }, + { + "epoch": 8.0, + "eval_loss": 2.2097363471984863, + "eval_runtime": 43.8387, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 600 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 9.8316431917056e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-600/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..23fcacd12ca65840a5f64b71b9d05a0e7778cb7c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ad8a2f465978cc2da3c7c3f543d82215c53c9c3771b8c657035a0737f00024f3 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..42274d14dc48c4a96b315c6309c41176a9bbb2e9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b40e019afc15927d9b8722a47b71a14647170b9cf2612731eb7a08fac1a9908d +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..6a970899a5edc16268fdea83560e0495a3d06810 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fa5b53289977451ca52671d3897055616936322daf22f6e4246ff72a467aef1c +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..9b9fdd7729d6b267eb60a9f9fc1c4b0da06afadb --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0dc531ffeec52ecbfc5abf24609b1f6348758429ff8e4a6ba63ac2e497a6e2d6 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..98902df2d67a98ea198251398003482b65f3ac9b --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/trainer_state.json @@ -0,0 +1,948 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 8.133333333333333, + "eval_steps": 10, + "global_step": 610, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.3883018493652344, + "learning_rate": 3.674074074074074e-05, + "loss": 1.3245, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.9867770671844482, + "eval_runtime": 43.8496, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 2.347261667251587, + "learning_rate": 3.555555555555555e-05, + "loss": 1.2478, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 2.0367517471313477, + "eval_runtime": 43.8491, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.6627748012542725, + "learning_rate": 3.437037037037037e-05, + "loss": 1.1202, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 2.103827714920044, + "eval_runtime": 43.8256, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.5760111808776855, + "learning_rate": 3.318518518518519e-05, + "loss": 1.0481, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 2.075502872467041, + "eval_runtime": 43.8158, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 3.0876505374908447, + "learning_rate": 3.211851851851852e-05, + "loss": 1.1515, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 2.0795600414276123, + "eval_runtime": 43.8222, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.520632028579712, + "learning_rate": 3.093333333333334e-05, + "loss": 1.2198, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 2.0835771560668945, + "eval_runtime": 43.8209, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.853, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.776945114135742, + "learning_rate": 2.974814814814815e-05, + "loss": 1.1505, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 2.07682204246521, + "eval_runtime": 43.8225, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 4.1550798416137695, + "learning_rate": 2.8562962962962966e-05, + "loss": 1.1495, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 2.074658155441284, + "eval_runtime": 43.8288, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 4.296061038970947, + "learning_rate": 2.737777777777778e-05, + "loss": 1.1867, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 2.0737810134887695, + "eval_runtime": 43.8279, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 3.2808094024658203, + "learning_rate": 2.6192592592592597e-05, + "loss": 1.0536, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 2.1731176376342773, + "eval_runtime": 43.8381, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.035400629043579, + "learning_rate": 2.5007407407407408e-05, + "loss": 1.0091, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 2.1523854732513428, + "eval_runtime": 43.8544, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.89253830909729, + "learning_rate": 2.382222222222222e-05, + "loss": 1.0155, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 2.144387722015381, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 3.1794967651367188, + "learning_rate": 2.263703703703704e-05, + "loss": 1.0635, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 2.1576201915740967, + "eval_runtime": 43.8508, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 3.1008200645446777, + "learning_rate": 2.157037037037037e-05, + "loss": 1.0641, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 2.1437439918518066, + "eval_runtime": 43.8399, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.045117139816284, + "learning_rate": 2.038518518518519e-05, + "loss": 1.0651, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 2.149211883544922, + "eval_runtime": 43.8417, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 3.004579782485962, + "learning_rate": 1.9200000000000003e-05, + "loss": 1.1477, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 2.15095853805542, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 520 + }, + { + "epoch": 7.066666666666666, + "grad_norm": 3.0588927268981934, + "learning_rate": 1.8014814814814817e-05, + "loss": 1.1034, + "step": 530 + }, + { + "epoch": 7.066666666666666, + "eval_loss": 2.1752231121063232, + "eval_runtime": 43.8223, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 530 + }, + { + "epoch": 7.2, + "grad_norm": 3.6574132442474365, + "learning_rate": 1.682962962962963e-05, + "loss": 0.9523, + "step": 540 + }, + { + "epoch": 7.2, + "eval_loss": 2.224954128265381, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 540 + }, + { + "epoch": 7.333333333333333, + "grad_norm": 2.6502890586853027, + "learning_rate": 1.5644444444444448e-05, + "loss": 0.9977, + "step": 550 + }, + { + "epoch": 7.333333333333333, + "eval_loss": 2.2001636028289795, + "eval_runtime": 43.8171, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 550 + }, + { + "epoch": 7.466666666666667, + "grad_norm": 3.5509214401245117, + "learning_rate": 1.445925925925926e-05, + "loss": 1.0191, + "step": 560 + }, + { + "epoch": 7.466666666666667, + "eval_loss": 2.206838846206665, + "eval_runtime": 43.8328, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 560 + }, + { + "epoch": 7.6, + "grad_norm": 3.4937808513641357, + "learning_rate": 1.3274074074074074e-05, + "loss": 0.9302, + "step": 570 + }, + { + "epoch": 7.6, + "eval_loss": 2.2026851177215576, + "eval_runtime": 43.8326, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 570 + }, + { + "epoch": 7.733333333333333, + "grad_norm": 3.4799985885620117, + "learning_rate": 1.208888888888889e-05, + "loss": 1.0328, + "step": 580 + }, + { + "epoch": 7.733333333333333, + "eval_loss": 2.202017068862915, + "eval_runtime": 43.8401, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 580 + }, + { + "epoch": 7.866666666666667, + "grad_norm": 3.961993932723999, + "learning_rate": 1.0903703703703706e-05, + "loss": 1.0468, + "step": 590 + }, + { + "epoch": 7.866666666666667, + "eval_loss": 2.2057199478149414, + "eval_runtime": 43.8449, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 590 + }, + { + "epoch": 8.0, + "grad_norm": 3.0407888889312744, + "learning_rate": 9.837037037037038e-06, + "loss": 1.0029, + "step": 600 + }, + { + "epoch": 8.0, + "eval_loss": 2.2097363471984863, + "eval_runtime": 43.8387, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 600 + }, + { + "epoch": 8.133333333333333, + "grad_norm": 3.244175910949707, + "learning_rate": 8.651851851851852e-06, + "loss": 0.9522, + "step": 610 + }, + { + "epoch": 8.133333333333333, + "eval_loss": 2.2325267791748047, + "eval_runtime": 43.8391, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 610 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 9.99550391156736e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-610/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..076580d656719fda9ab5724ad6624adb2c60e1c1 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e03c0b262529f42ffba24d444e2a90ccc48a431c2b3ce2fe7b41337c2135d45d +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..dcde68fc2c23fd4b7742fcd281a366fb4dc5bd1c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cf556b64b9e98f1c6997a784b16804864027dab879b9cac69b611f7fe9551ea6 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..da7e5f0f7045f8fad1c1529974e555cc67b8f5f0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f2b2ce429e00eba0165cdfd527b7ca384fed68ae5660561d0cbc6dbdd51ce7f1 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..d89249d27daf3c057419f9a95d33d7f5950ceb2b --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fb8d6fa7dc4e8e47e6cb3df304a96784e1dd00ccd79a7f138066fa5478daee73 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..1099bc9aff37e7daaa3f4f2e9a0e3b65c57e510a --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/trainer_state.json @@ -0,0 +1,963 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 8.266666666666667, + "eval_steps": 10, + "global_step": 620, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.3883018493652344, + "learning_rate": 3.674074074074074e-05, + "loss": 1.3245, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.9867770671844482, + "eval_runtime": 43.8496, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 2.347261667251587, + "learning_rate": 3.555555555555555e-05, + "loss": 1.2478, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 2.0367517471313477, + "eval_runtime": 43.8491, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.6627748012542725, + "learning_rate": 3.437037037037037e-05, + "loss": 1.1202, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 2.103827714920044, + "eval_runtime": 43.8256, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.5760111808776855, + "learning_rate": 3.318518518518519e-05, + "loss": 1.0481, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 2.075502872467041, + "eval_runtime": 43.8158, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 3.0876505374908447, + "learning_rate": 3.211851851851852e-05, + "loss": 1.1515, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 2.0795600414276123, + "eval_runtime": 43.8222, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.520632028579712, + "learning_rate": 3.093333333333334e-05, + "loss": 1.2198, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 2.0835771560668945, + "eval_runtime": 43.8209, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.853, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.776945114135742, + "learning_rate": 2.974814814814815e-05, + "loss": 1.1505, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 2.07682204246521, + "eval_runtime": 43.8225, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 4.1550798416137695, + "learning_rate": 2.8562962962962966e-05, + "loss": 1.1495, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 2.074658155441284, + "eval_runtime": 43.8288, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 4.296061038970947, + "learning_rate": 2.737777777777778e-05, + "loss": 1.1867, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 2.0737810134887695, + "eval_runtime": 43.8279, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 3.2808094024658203, + "learning_rate": 2.6192592592592597e-05, + "loss": 1.0536, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 2.1731176376342773, + "eval_runtime": 43.8381, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.035400629043579, + "learning_rate": 2.5007407407407408e-05, + "loss": 1.0091, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 2.1523854732513428, + "eval_runtime": 43.8544, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.89253830909729, + "learning_rate": 2.382222222222222e-05, + "loss": 1.0155, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 2.144387722015381, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 3.1794967651367188, + "learning_rate": 2.263703703703704e-05, + "loss": 1.0635, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 2.1576201915740967, + "eval_runtime": 43.8508, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 3.1008200645446777, + "learning_rate": 2.157037037037037e-05, + "loss": 1.0641, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 2.1437439918518066, + "eval_runtime": 43.8399, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.045117139816284, + "learning_rate": 2.038518518518519e-05, + "loss": 1.0651, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 2.149211883544922, + "eval_runtime": 43.8417, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 3.004579782485962, + "learning_rate": 1.9200000000000003e-05, + "loss": 1.1477, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 2.15095853805542, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 520 + }, + { + "epoch": 7.066666666666666, + "grad_norm": 3.0588927268981934, + "learning_rate": 1.8014814814814817e-05, + "loss": 1.1034, + "step": 530 + }, + { + "epoch": 7.066666666666666, + "eval_loss": 2.1752231121063232, + "eval_runtime": 43.8223, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 530 + }, + { + "epoch": 7.2, + "grad_norm": 3.6574132442474365, + "learning_rate": 1.682962962962963e-05, + "loss": 0.9523, + "step": 540 + }, + { + "epoch": 7.2, + "eval_loss": 2.224954128265381, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 540 + }, + { + "epoch": 7.333333333333333, + "grad_norm": 2.6502890586853027, + "learning_rate": 1.5644444444444448e-05, + "loss": 0.9977, + "step": 550 + }, + { + "epoch": 7.333333333333333, + "eval_loss": 2.2001636028289795, + "eval_runtime": 43.8171, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 550 + }, + { + "epoch": 7.466666666666667, + "grad_norm": 3.5509214401245117, + "learning_rate": 1.445925925925926e-05, + "loss": 1.0191, + "step": 560 + }, + { + "epoch": 7.466666666666667, + "eval_loss": 2.206838846206665, + "eval_runtime": 43.8328, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 560 + }, + { + "epoch": 7.6, + "grad_norm": 3.4937808513641357, + "learning_rate": 1.3274074074074074e-05, + "loss": 0.9302, + "step": 570 + }, + { + "epoch": 7.6, + "eval_loss": 2.2026851177215576, + "eval_runtime": 43.8326, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 570 + }, + { + "epoch": 7.733333333333333, + "grad_norm": 3.4799985885620117, + "learning_rate": 1.208888888888889e-05, + "loss": 1.0328, + "step": 580 + }, + { + "epoch": 7.733333333333333, + "eval_loss": 2.202017068862915, + "eval_runtime": 43.8401, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 580 + }, + { + "epoch": 7.866666666666667, + "grad_norm": 3.961993932723999, + "learning_rate": 1.0903703703703706e-05, + "loss": 1.0468, + "step": 590 + }, + { + "epoch": 7.866666666666667, + "eval_loss": 2.2057199478149414, + "eval_runtime": 43.8449, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 590 + }, + { + "epoch": 8.0, + "grad_norm": 3.0407888889312744, + "learning_rate": 9.837037037037038e-06, + "loss": 1.0029, + "step": 600 + }, + { + "epoch": 8.0, + "eval_loss": 2.2097363471984863, + "eval_runtime": 43.8387, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 600 + }, + { + "epoch": 8.133333333333333, + "grad_norm": 3.244175910949707, + "learning_rate": 8.651851851851852e-06, + "loss": 0.9522, + "step": 610 + }, + { + "epoch": 8.133333333333333, + "eval_loss": 2.2325267791748047, + "eval_runtime": 43.8391, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 610 + }, + { + "epoch": 8.266666666666667, + "grad_norm": 4.19291353225708, + "learning_rate": 7.4666666666666675e-06, + "loss": 0.9562, + "step": 620 + }, + { + "epoch": 8.266666666666667, + "eval_loss": 2.2521543502807617, + "eval_runtime": 43.8293, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 620 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 1.015936463142912e+17, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-620/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..e3ddc8aa8f3e3cf8b90377b870c2a60a1c723bdf --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8014c06c2a43fdf8d8f3a2b81b95de166a880d207943dfa2210a7de4a32dd4b8 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..d3b5fca86c5bb144bfd06f75eab464d374a2cb2d --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:040a7879cf70dc2d63017cc707d338fbaeabc0184555a71e321f92ba15dd3ce6 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..96d7a3f6be074e46014211fae837a521e5c5140c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dd6c4f62bed5401eddcf930d960632a48c624bea715ca64cedd7d04db198b4a0 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..4d7896da2bd52a95545b5fec6697a71326e400ca --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2c8caf8a613123abed629dc2ccb61ff3ac09163ab04f0cdab2b2e2c909454f1c +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..ab24561f94c34700d2e990200ec1f53b025a5268 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/trainer_state.json @@ -0,0 +1,978 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 8.4, + "eval_steps": 10, + "global_step": 630, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.3883018493652344, + "learning_rate": 3.674074074074074e-05, + "loss": 1.3245, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.9867770671844482, + "eval_runtime": 43.8496, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 2.347261667251587, + "learning_rate": 3.555555555555555e-05, + "loss": 1.2478, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 2.0367517471313477, + "eval_runtime": 43.8491, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.6627748012542725, + "learning_rate": 3.437037037037037e-05, + "loss": 1.1202, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 2.103827714920044, + "eval_runtime": 43.8256, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.5760111808776855, + "learning_rate": 3.318518518518519e-05, + "loss": 1.0481, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 2.075502872467041, + "eval_runtime": 43.8158, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 3.0876505374908447, + "learning_rate": 3.211851851851852e-05, + "loss": 1.1515, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 2.0795600414276123, + "eval_runtime": 43.8222, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.520632028579712, + "learning_rate": 3.093333333333334e-05, + "loss": 1.2198, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 2.0835771560668945, + "eval_runtime": 43.8209, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.853, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.776945114135742, + "learning_rate": 2.974814814814815e-05, + "loss": 1.1505, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 2.07682204246521, + "eval_runtime": 43.8225, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 4.1550798416137695, + "learning_rate": 2.8562962962962966e-05, + "loss": 1.1495, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 2.074658155441284, + "eval_runtime": 43.8288, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 4.296061038970947, + "learning_rate": 2.737777777777778e-05, + "loss": 1.1867, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 2.0737810134887695, + "eval_runtime": 43.8279, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 3.2808094024658203, + "learning_rate": 2.6192592592592597e-05, + "loss": 1.0536, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 2.1731176376342773, + "eval_runtime": 43.8381, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.035400629043579, + "learning_rate": 2.5007407407407408e-05, + "loss": 1.0091, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 2.1523854732513428, + "eval_runtime": 43.8544, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.89253830909729, + "learning_rate": 2.382222222222222e-05, + "loss": 1.0155, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 2.144387722015381, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 3.1794967651367188, + "learning_rate": 2.263703703703704e-05, + "loss": 1.0635, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 2.1576201915740967, + "eval_runtime": 43.8508, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 3.1008200645446777, + "learning_rate": 2.157037037037037e-05, + "loss": 1.0641, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 2.1437439918518066, + "eval_runtime": 43.8399, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.045117139816284, + "learning_rate": 2.038518518518519e-05, + "loss": 1.0651, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 2.149211883544922, + "eval_runtime": 43.8417, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 3.004579782485962, + "learning_rate": 1.9200000000000003e-05, + "loss": 1.1477, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 2.15095853805542, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 520 + }, + { + "epoch": 7.066666666666666, + "grad_norm": 3.0588927268981934, + "learning_rate": 1.8014814814814817e-05, + "loss": 1.1034, + "step": 530 + }, + { + "epoch": 7.066666666666666, + "eval_loss": 2.1752231121063232, + "eval_runtime": 43.8223, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 530 + }, + { + "epoch": 7.2, + "grad_norm": 3.6574132442474365, + "learning_rate": 1.682962962962963e-05, + "loss": 0.9523, + "step": 540 + }, + { + "epoch": 7.2, + "eval_loss": 2.224954128265381, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 540 + }, + { + "epoch": 7.333333333333333, + "grad_norm": 2.6502890586853027, + "learning_rate": 1.5644444444444448e-05, + "loss": 0.9977, + "step": 550 + }, + { + "epoch": 7.333333333333333, + "eval_loss": 2.2001636028289795, + "eval_runtime": 43.8171, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 550 + }, + { + "epoch": 7.466666666666667, + "grad_norm": 3.5509214401245117, + "learning_rate": 1.445925925925926e-05, + "loss": 1.0191, + "step": 560 + }, + { + "epoch": 7.466666666666667, + "eval_loss": 2.206838846206665, + "eval_runtime": 43.8328, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 560 + }, + { + "epoch": 7.6, + "grad_norm": 3.4937808513641357, + "learning_rate": 1.3274074074074074e-05, + "loss": 0.9302, + "step": 570 + }, + { + "epoch": 7.6, + "eval_loss": 2.2026851177215576, + "eval_runtime": 43.8326, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 570 + }, + { + "epoch": 7.733333333333333, + "grad_norm": 3.4799985885620117, + "learning_rate": 1.208888888888889e-05, + "loss": 1.0328, + "step": 580 + }, + { + "epoch": 7.733333333333333, + "eval_loss": 2.202017068862915, + "eval_runtime": 43.8401, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 580 + }, + { + "epoch": 7.866666666666667, + "grad_norm": 3.961993932723999, + "learning_rate": 1.0903703703703706e-05, + "loss": 1.0468, + "step": 590 + }, + { + "epoch": 7.866666666666667, + "eval_loss": 2.2057199478149414, + "eval_runtime": 43.8449, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 590 + }, + { + "epoch": 8.0, + "grad_norm": 3.0407888889312744, + "learning_rate": 9.837037037037038e-06, + "loss": 1.0029, + "step": 600 + }, + { + "epoch": 8.0, + "eval_loss": 2.2097363471984863, + "eval_runtime": 43.8387, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 600 + }, + { + "epoch": 8.133333333333333, + "grad_norm": 3.244175910949707, + "learning_rate": 8.651851851851852e-06, + "loss": 0.9522, + "step": 610 + }, + { + "epoch": 8.133333333333333, + "eval_loss": 2.2325267791748047, + "eval_runtime": 43.8391, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 610 + }, + { + "epoch": 8.266666666666667, + "grad_norm": 4.19291353225708, + "learning_rate": 7.4666666666666675e-06, + "loss": 0.9562, + "step": 620 + }, + { + "epoch": 8.266666666666667, + "eval_loss": 2.2521543502807617, + "eval_runtime": 43.8293, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 620 + }, + { + "epoch": 8.4, + "grad_norm": 2.8447179794311523, + "learning_rate": 6.2814814814814814e-06, + "loss": 1.041, + "step": 630 + }, + { + "epoch": 8.4, + "eval_loss": 2.246365785598755, + "eval_runtime": 43.8554, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 630 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 1.032322535129088e+17, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-630/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..0575bc0bcfd27ec3edb9084136b212f206fa04b1 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b66102e3df7acb0090e1bbe69db11427cd203085cb3a5cd1d8928d9a5a5b546d +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..fa5392fec117f6e44e0974fae23a5cf9f33572df --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:cde79702c8555f7e08aa1652fa1c6de91c3ec4beac3d87e9dcc58464a022d619 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..bc02fa7e506af341c87e94bd62a6cbdfbd057096 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0597f3b9ac321e002676eb1712670348770197d9b197cdd7a7e16f465315444e +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..05c3bf41fac8f20f7910b7d2ca1a683b17499f67 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:386acae1b7fcd2c80fa688722046f067276388329246e55a0257f90306ce8e65 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..98fd5b70232e5ad64905a2d975a384b3a074d6b8 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/trainer_state.json @@ -0,0 +1,993 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 8.533333333333333, + "eval_steps": 10, + "global_step": 640, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.3883018493652344, + "learning_rate": 3.674074074074074e-05, + "loss": 1.3245, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.9867770671844482, + "eval_runtime": 43.8496, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 2.347261667251587, + "learning_rate": 3.555555555555555e-05, + "loss": 1.2478, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 2.0367517471313477, + "eval_runtime": 43.8491, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.6627748012542725, + "learning_rate": 3.437037037037037e-05, + "loss": 1.1202, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 2.103827714920044, + "eval_runtime": 43.8256, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.5760111808776855, + "learning_rate": 3.318518518518519e-05, + "loss": 1.0481, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 2.075502872467041, + "eval_runtime": 43.8158, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 3.0876505374908447, + "learning_rate": 3.211851851851852e-05, + "loss": 1.1515, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 2.0795600414276123, + "eval_runtime": 43.8222, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.520632028579712, + "learning_rate": 3.093333333333334e-05, + "loss": 1.2198, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 2.0835771560668945, + "eval_runtime": 43.8209, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.853, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.776945114135742, + "learning_rate": 2.974814814814815e-05, + "loss": 1.1505, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 2.07682204246521, + "eval_runtime": 43.8225, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 4.1550798416137695, + "learning_rate": 2.8562962962962966e-05, + "loss": 1.1495, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 2.074658155441284, + "eval_runtime": 43.8288, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 4.296061038970947, + "learning_rate": 2.737777777777778e-05, + "loss": 1.1867, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 2.0737810134887695, + "eval_runtime": 43.8279, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 3.2808094024658203, + "learning_rate": 2.6192592592592597e-05, + "loss": 1.0536, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 2.1731176376342773, + "eval_runtime": 43.8381, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.035400629043579, + "learning_rate": 2.5007407407407408e-05, + "loss": 1.0091, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 2.1523854732513428, + "eval_runtime": 43.8544, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.89253830909729, + "learning_rate": 2.382222222222222e-05, + "loss": 1.0155, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 2.144387722015381, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 3.1794967651367188, + "learning_rate": 2.263703703703704e-05, + "loss": 1.0635, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 2.1576201915740967, + "eval_runtime": 43.8508, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 3.1008200645446777, + "learning_rate": 2.157037037037037e-05, + "loss": 1.0641, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 2.1437439918518066, + "eval_runtime": 43.8399, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.045117139816284, + "learning_rate": 2.038518518518519e-05, + "loss": 1.0651, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 2.149211883544922, + "eval_runtime": 43.8417, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 3.004579782485962, + "learning_rate": 1.9200000000000003e-05, + "loss": 1.1477, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 2.15095853805542, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 520 + }, + { + "epoch": 7.066666666666666, + "grad_norm": 3.0588927268981934, + "learning_rate": 1.8014814814814817e-05, + "loss": 1.1034, + "step": 530 + }, + { + "epoch": 7.066666666666666, + "eval_loss": 2.1752231121063232, + "eval_runtime": 43.8223, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 530 + }, + { + "epoch": 7.2, + "grad_norm": 3.6574132442474365, + "learning_rate": 1.682962962962963e-05, + "loss": 0.9523, + "step": 540 + }, + { + "epoch": 7.2, + "eval_loss": 2.224954128265381, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 540 + }, + { + "epoch": 7.333333333333333, + "grad_norm": 2.6502890586853027, + "learning_rate": 1.5644444444444448e-05, + "loss": 0.9977, + "step": 550 + }, + { + "epoch": 7.333333333333333, + "eval_loss": 2.2001636028289795, + "eval_runtime": 43.8171, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 550 + }, + { + "epoch": 7.466666666666667, + "grad_norm": 3.5509214401245117, + "learning_rate": 1.445925925925926e-05, + "loss": 1.0191, + "step": 560 + }, + { + "epoch": 7.466666666666667, + "eval_loss": 2.206838846206665, + "eval_runtime": 43.8328, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 560 + }, + { + "epoch": 7.6, + "grad_norm": 3.4937808513641357, + "learning_rate": 1.3274074074074074e-05, + "loss": 0.9302, + "step": 570 + }, + { + "epoch": 7.6, + "eval_loss": 2.2026851177215576, + "eval_runtime": 43.8326, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 570 + }, + { + "epoch": 7.733333333333333, + "grad_norm": 3.4799985885620117, + "learning_rate": 1.208888888888889e-05, + "loss": 1.0328, + "step": 580 + }, + { + "epoch": 7.733333333333333, + "eval_loss": 2.202017068862915, + "eval_runtime": 43.8401, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 580 + }, + { + "epoch": 7.866666666666667, + "grad_norm": 3.961993932723999, + "learning_rate": 1.0903703703703706e-05, + "loss": 1.0468, + "step": 590 + }, + { + "epoch": 7.866666666666667, + "eval_loss": 2.2057199478149414, + "eval_runtime": 43.8449, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 590 + }, + { + "epoch": 8.0, + "grad_norm": 3.0407888889312744, + "learning_rate": 9.837037037037038e-06, + "loss": 1.0029, + "step": 600 + }, + { + "epoch": 8.0, + "eval_loss": 2.2097363471984863, + "eval_runtime": 43.8387, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 600 + }, + { + "epoch": 8.133333333333333, + "grad_norm": 3.244175910949707, + "learning_rate": 8.651851851851852e-06, + "loss": 0.9522, + "step": 610 + }, + { + "epoch": 8.133333333333333, + "eval_loss": 2.2325267791748047, + "eval_runtime": 43.8391, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 610 + }, + { + "epoch": 8.266666666666667, + "grad_norm": 4.19291353225708, + "learning_rate": 7.4666666666666675e-06, + "loss": 0.9562, + "step": 620 + }, + { + "epoch": 8.266666666666667, + "eval_loss": 2.2521543502807617, + "eval_runtime": 43.8293, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 620 + }, + { + "epoch": 8.4, + "grad_norm": 2.8447179794311523, + "learning_rate": 6.2814814814814814e-06, + "loss": 1.041, + "step": 630 + }, + { + "epoch": 8.4, + "eval_loss": 2.246365785598755, + "eval_runtime": 43.8554, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 630 + }, + { + "epoch": 8.533333333333333, + "grad_norm": 3.6524436473846436, + "learning_rate": 5.096296296296297e-06, + "loss": 0.9336, + "step": 640 + }, + { + "epoch": 8.533333333333333, + "eval_loss": 2.2426538467407227, + "eval_runtime": 43.8259, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 640 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 1.048708607115264e+17, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-640/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..2ea496bd347ef03ead88be345769c7b08aba8657 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f12bbb0b1b24c2f699d2a84ccdf4271e57e680a9eac62c5943384098ffaede5e +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..7fa4586051493ec8c1f1041051e81e01e8358851 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a7f239178aefa984e16a21c7202fff593c6838213addec101fccea2ab3048c23 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..4d763156eb3a586b51733d4ec683a815a6ae5fab --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e66e316bd2615a5005aac13970f8b8e71830843ea716191e53ff7dc38997af08 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..fe70d8fde63930cd36c8ebd9bcdde7073431b63c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5a6198dfb5ae85225d4b22d5a1acaf3b755df5fab28d0bdf203d64bd6988e60b +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..ce931ef31f48d0dfc7ae2e309b1c62820afb47b6 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/trainer_state.json @@ -0,0 +1,1008 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 8.666666666666666, + "eval_steps": 10, + "global_step": 650, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.3883018493652344, + "learning_rate": 3.674074074074074e-05, + "loss": 1.3245, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.9867770671844482, + "eval_runtime": 43.8496, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 2.347261667251587, + "learning_rate": 3.555555555555555e-05, + "loss": 1.2478, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 2.0367517471313477, + "eval_runtime": 43.8491, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.6627748012542725, + "learning_rate": 3.437037037037037e-05, + "loss": 1.1202, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 2.103827714920044, + "eval_runtime": 43.8256, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.5760111808776855, + "learning_rate": 3.318518518518519e-05, + "loss": 1.0481, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 2.075502872467041, + "eval_runtime": 43.8158, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 3.0876505374908447, + "learning_rate": 3.211851851851852e-05, + "loss": 1.1515, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 2.0795600414276123, + "eval_runtime": 43.8222, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.520632028579712, + "learning_rate": 3.093333333333334e-05, + "loss": 1.2198, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 2.0835771560668945, + "eval_runtime": 43.8209, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.853, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.776945114135742, + "learning_rate": 2.974814814814815e-05, + "loss": 1.1505, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 2.07682204246521, + "eval_runtime": 43.8225, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 4.1550798416137695, + "learning_rate": 2.8562962962962966e-05, + "loss": 1.1495, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 2.074658155441284, + "eval_runtime": 43.8288, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 4.296061038970947, + "learning_rate": 2.737777777777778e-05, + "loss": 1.1867, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 2.0737810134887695, + "eval_runtime": 43.8279, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 3.2808094024658203, + "learning_rate": 2.6192592592592597e-05, + "loss": 1.0536, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 2.1731176376342773, + "eval_runtime": 43.8381, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.035400629043579, + "learning_rate": 2.5007407407407408e-05, + "loss": 1.0091, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 2.1523854732513428, + "eval_runtime": 43.8544, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.89253830909729, + "learning_rate": 2.382222222222222e-05, + "loss": 1.0155, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 2.144387722015381, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 3.1794967651367188, + "learning_rate": 2.263703703703704e-05, + "loss": 1.0635, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 2.1576201915740967, + "eval_runtime": 43.8508, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 3.1008200645446777, + "learning_rate": 2.157037037037037e-05, + "loss": 1.0641, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 2.1437439918518066, + "eval_runtime": 43.8399, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.045117139816284, + "learning_rate": 2.038518518518519e-05, + "loss": 1.0651, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 2.149211883544922, + "eval_runtime": 43.8417, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 3.004579782485962, + "learning_rate": 1.9200000000000003e-05, + "loss": 1.1477, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 2.15095853805542, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 520 + }, + { + "epoch": 7.066666666666666, + "grad_norm": 3.0588927268981934, + "learning_rate": 1.8014814814814817e-05, + "loss": 1.1034, + "step": 530 + }, + { + "epoch": 7.066666666666666, + "eval_loss": 2.1752231121063232, + "eval_runtime": 43.8223, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 530 + }, + { + "epoch": 7.2, + "grad_norm": 3.6574132442474365, + "learning_rate": 1.682962962962963e-05, + "loss": 0.9523, + "step": 540 + }, + { + "epoch": 7.2, + "eval_loss": 2.224954128265381, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 540 + }, + { + "epoch": 7.333333333333333, + "grad_norm": 2.6502890586853027, + "learning_rate": 1.5644444444444448e-05, + "loss": 0.9977, + "step": 550 + }, + { + "epoch": 7.333333333333333, + "eval_loss": 2.2001636028289795, + "eval_runtime": 43.8171, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 550 + }, + { + "epoch": 7.466666666666667, + "grad_norm": 3.5509214401245117, + "learning_rate": 1.445925925925926e-05, + "loss": 1.0191, + "step": 560 + }, + { + "epoch": 7.466666666666667, + "eval_loss": 2.206838846206665, + "eval_runtime": 43.8328, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 560 + }, + { + "epoch": 7.6, + "grad_norm": 3.4937808513641357, + "learning_rate": 1.3274074074074074e-05, + "loss": 0.9302, + "step": 570 + }, + { + "epoch": 7.6, + "eval_loss": 2.2026851177215576, + "eval_runtime": 43.8326, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 570 + }, + { + "epoch": 7.733333333333333, + "grad_norm": 3.4799985885620117, + "learning_rate": 1.208888888888889e-05, + "loss": 1.0328, + "step": 580 + }, + { + "epoch": 7.733333333333333, + "eval_loss": 2.202017068862915, + "eval_runtime": 43.8401, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 580 + }, + { + "epoch": 7.866666666666667, + "grad_norm": 3.961993932723999, + "learning_rate": 1.0903703703703706e-05, + "loss": 1.0468, + "step": 590 + }, + { + "epoch": 7.866666666666667, + "eval_loss": 2.2057199478149414, + "eval_runtime": 43.8449, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 590 + }, + { + "epoch": 8.0, + "grad_norm": 3.0407888889312744, + "learning_rate": 9.837037037037038e-06, + "loss": 1.0029, + "step": 600 + }, + { + "epoch": 8.0, + "eval_loss": 2.2097363471984863, + "eval_runtime": 43.8387, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 600 + }, + { + "epoch": 8.133333333333333, + "grad_norm": 3.244175910949707, + "learning_rate": 8.651851851851852e-06, + "loss": 0.9522, + "step": 610 + }, + { + "epoch": 8.133333333333333, + "eval_loss": 2.2325267791748047, + "eval_runtime": 43.8391, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 610 + }, + { + "epoch": 8.266666666666667, + "grad_norm": 4.19291353225708, + "learning_rate": 7.4666666666666675e-06, + "loss": 0.9562, + "step": 620 + }, + { + "epoch": 8.266666666666667, + "eval_loss": 2.2521543502807617, + "eval_runtime": 43.8293, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 620 + }, + { + "epoch": 8.4, + "grad_norm": 2.8447179794311523, + "learning_rate": 6.2814814814814814e-06, + "loss": 1.041, + "step": 630 + }, + { + "epoch": 8.4, + "eval_loss": 2.246365785598755, + "eval_runtime": 43.8554, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 630 + }, + { + "epoch": 8.533333333333333, + "grad_norm": 3.6524436473846436, + "learning_rate": 5.096296296296297e-06, + "loss": 0.9336, + "step": 640 + }, + { + "epoch": 8.533333333333333, + "eval_loss": 2.2426538467407227, + "eval_runtime": 43.8259, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 640 + }, + { + "epoch": 8.666666666666666, + "grad_norm": 3.7157490253448486, + "learning_rate": 3.911111111111112e-06, + "loss": 0.9294, + "step": 650 + }, + { + "epoch": 8.666666666666666, + "eval_loss": 2.24678897857666, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 650 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 1.06509467910144e+17, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-650/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..34dd07c304fe37880daca8a00962ac6f33f24b17 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a9c337fb4e0ae679cbe4c31e667f25bcea42f5e7aa74489b53a9b215c11b0b12 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..6d40065433e5a0a31d7464ff83f463fba4b1caee --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:075930ffae4d79043affe27790169568c563578df9ba6d28d6eb6551a9d76d74 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..1bd0e24dcfea6867dcdb66e0b90f3344dbd9d339 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:66fa7ea9452d536e82e5c18c4a0a05615143763aa569d9af13553a06a11128de +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..aa32157caa8adce10f6dfc5620fa1497a84dee8c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4055e09e21e7966a34c52baedc4612b0f471569c429a9af6ff38f43315827940 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..a148dad3ac136808f75b9f68d7f8180c88a3bd08 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/trainer_state.json @@ -0,0 +1,1023 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 8.8, + "eval_steps": 10, + "global_step": 660, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.3883018493652344, + "learning_rate": 3.674074074074074e-05, + "loss": 1.3245, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.9867770671844482, + "eval_runtime": 43.8496, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 2.347261667251587, + "learning_rate": 3.555555555555555e-05, + "loss": 1.2478, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 2.0367517471313477, + "eval_runtime": 43.8491, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.6627748012542725, + "learning_rate": 3.437037037037037e-05, + "loss": 1.1202, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 2.103827714920044, + "eval_runtime": 43.8256, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.5760111808776855, + "learning_rate": 3.318518518518519e-05, + "loss": 1.0481, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 2.075502872467041, + "eval_runtime": 43.8158, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 3.0876505374908447, + "learning_rate": 3.211851851851852e-05, + "loss": 1.1515, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 2.0795600414276123, + "eval_runtime": 43.8222, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.520632028579712, + "learning_rate": 3.093333333333334e-05, + "loss": 1.2198, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 2.0835771560668945, + "eval_runtime": 43.8209, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.853, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.776945114135742, + "learning_rate": 2.974814814814815e-05, + "loss": 1.1505, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 2.07682204246521, + "eval_runtime": 43.8225, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 4.1550798416137695, + "learning_rate": 2.8562962962962966e-05, + "loss": 1.1495, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 2.074658155441284, + "eval_runtime": 43.8288, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 4.296061038970947, + "learning_rate": 2.737777777777778e-05, + "loss": 1.1867, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 2.0737810134887695, + "eval_runtime": 43.8279, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 3.2808094024658203, + "learning_rate": 2.6192592592592597e-05, + "loss": 1.0536, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 2.1731176376342773, + "eval_runtime": 43.8381, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.035400629043579, + "learning_rate": 2.5007407407407408e-05, + "loss": 1.0091, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 2.1523854732513428, + "eval_runtime": 43.8544, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.89253830909729, + "learning_rate": 2.382222222222222e-05, + "loss": 1.0155, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 2.144387722015381, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 3.1794967651367188, + "learning_rate": 2.263703703703704e-05, + "loss": 1.0635, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 2.1576201915740967, + "eval_runtime": 43.8508, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 3.1008200645446777, + "learning_rate": 2.157037037037037e-05, + "loss": 1.0641, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 2.1437439918518066, + "eval_runtime": 43.8399, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.045117139816284, + "learning_rate": 2.038518518518519e-05, + "loss": 1.0651, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 2.149211883544922, + "eval_runtime": 43.8417, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 3.004579782485962, + "learning_rate": 1.9200000000000003e-05, + "loss": 1.1477, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 2.15095853805542, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 520 + }, + { + "epoch": 7.066666666666666, + "grad_norm": 3.0588927268981934, + "learning_rate": 1.8014814814814817e-05, + "loss": 1.1034, + "step": 530 + }, + { + "epoch": 7.066666666666666, + "eval_loss": 2.1752231121063232, + "eval_runtime": 43.8223, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 530 + }, + { + "epoch": 7.2, + "grad_norm": 3.6574132442474365, + "learning_rate": 1.682962962962963e-05, + "loss": 0.9523, + "step": 540 + }, + { + "epoch": 7.2, + "eval_loss": 2.224954128265381, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 540 + }, + { + "epoch": 7.333333333333333, + "grad_norm": 2.6502890586853027, + "learning_rate": 1.5644444444444448e-05, + "loss": 0.9977, + "step": 550 + }, + { + "epoch": 7.333333333333333, + "eval_loss": 2.2001636028289795, + "eval_runtime": 43.8171, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 550 + }, + { + "epoch": 7.466666666666667, + "grad_norm": 3.5509214401245117, + "learning_rate": 1.445925925925926e-05, + "loss": 1.0191, + "step": 560 + }, + { + "epoch": 7.466666666666667, + "eval_loss": 2.206838846206665, + "eval_runtime": 43.8328, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 560 + }, + { + "epoch": 7.6, + "grad_norm": 3.4937808513641357, + "learning_rate": 1.3274074074074074e-05, + "loss": 0.9302, + "step": 570 + }, + { + "epoch": 7.6, + "eval_loss": 2.2026851177215576, + "eval_runtime": 43.8326, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 570 + }, + { + "epoch": 7.733333333333333, + "grad_norm": 3.4799985885620117, + "learning_rate": 1.208888888888889e-05, + "loss": 1.0328, + "step": 580 + }, + { + "epoch": 7.733333333333333, + "eval_loss": 2.202017068862915, + "eval_runtime": 43.8401, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 580 + }, + { + "epoch": 7.866666666666667, + "grad_norm": 3.961993932723999, + "learning_rate": 1.0903703703703706e-05, + "loss": 1.0468, + "step": 590 + }, + { + "epoch": 7.866666666666667, + "eval_loss": 2.2057199478149414, + "eval_runtime": 43.8449, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 590 + }, + { + "epoch": 8.0, + "grad_norm": 3.0407888889312744, + "learning_rate": 9.837037037037038e-06, + "loss": 1.0029, + "step": 600 + }, + { + "epoch": 8.0, + "eval_loss": 2.2097363471984863, + "eval_runtime": 43.8387, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 600 + }, + { + "epoch": 8.133333333333333, + "grad_norm": 3.244175910949707, + "learning_rate": 8.651851851851852e-06, + "loss": 0.9522, + "step": 610 + }, + { + "epoch": 8.133333333333333, + "eval_loss": 2.2325267791748047, + "eval_runtime": 43.8391, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 610 + }, + { + "epoch": 8.266666666666667, + "grad_norm": 4.19291353225708, + "learning_rate": 7.4666666666666675e-06, + "loss": 0.9562, + "step": 620 + }, + { + "epoch": 8.266666666666667, + "eval_loss": 2.2521543502807617, + "eval_runtime": 43.8293, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 620 + }, + { + "epoch": 8.4, + "grad_norm": 2.8447179794311523, + "learning_rate": 6.2814814814814814e-06, + "loss": 1.041, + "step": 630 + }, + { + "epoch": 8.4, + "eval_loss": 2.246365785598755, + "eval_runtime": 43.8554, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 630 + }, + { + "epoch": 8.533333333333333, + "grad_norm": 3.6524436473846436, + "learning_rate": 5.096296296296297e-06, + "loss": 0.9336, + "step": 640 + }, + { + "epoch": 8.533333333333333, + "eval_loss": 2.2426538467407227, + "eval_runtime": 43.8259, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 640 + }, + { + "epoch": 8.666666666666666, + "grad_norm": 3.7157490253448486, + "learning_rate": 3.911111111111112e-06, + "loss": 0.9294, + "step": 650 + }, + { + "epoch": 8.666666666666666, + "eval_loss": 2.24678897857666, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 650 + }, + { + "epoch": 8.8, + "grad_norm": 3.1643199920654297, + "learning_rate": 2.7259259259259264e-06, + "loss": 0.948, + "step": 660 + }, + { + "epoch": 8.8, + "eval_loss": 2.249847173690796, + "eval_runtime": 43.8277, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 660 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 1.081480751087616e+17, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-660/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..2256eca56c3ee142abdbcf7b5226b307364e9fb9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:646fe42ba8acca81717a6ec3128b5da944be05c77d13ef9e2f16b74c2c4a6bb6 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..e292d979dbe0e1e4a7fc96c174db8d656e4f3967 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:6d9a73d9e4ed41d6cebf3291fa617797e14dd50a1ddc73be63e4c3dbc7321939 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..b50ed8357a00070f99a52843c3e3d150dbd5b1aa --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5bb0850ed44e50e4ccb2afc9aab9a80c17a31208454b069930105956f7f9a183 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..24263876645b2013de9f019d3f2d115bdede9a82 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:73ca28b3f31c5277f541286a97387f5e372cfe0a9164bb3bbb04d3e6f5d99bcf +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..086e88ccfc80bd842f39bd94f86b11645961f217 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/trainer_state.json @@ -0,0 +1,1038 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 8.933333333333334, + "eval_steps": 10, + "global_step": 670, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.3883018493652344, + "learning_rate": 3.674074074074074e-05, + "loss": 1.3245, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.9867770671844482, + "eval_runtime": 43.8496, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 2.347261667251587, + "learning_rate": 3.555555555555555e-05, + "loss": 1.2478, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 2.0367517471313477, + "eval_runtime": 43.8491, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.6627748012542725, + "learning_rate": 3.437037037037037e-05, + "loss": 1.1202, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 2.103827714920044, + "eval_runtime": 43.8256, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.5760111808776855, + "learning_rate": 3.318518518518519e-05, + "loss": 1.0481, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 2.075502872467041, + "eval_runtime": 43.8158, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 3.0876505374908447, + "learning_rate": 3.211851851851852e-05, + "loss": 1.1515, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 2.0795600414276123, + "eval_runtime": 43.8222, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.520632028579712, + "learning_rate": 3.093333333333334e-05, + "loss": 1.2198, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 2.0835771560668945, + "eval_runtime": 43.8209, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.853, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.776945114135742, + "learning_rate": 2.974814814814815e-05, + "loss": 1.1505, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 2.07682204246521, + "eval_runtime": 43.8225, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 4.1550798416137695, + "learning_rate": 2.8562962962962966e-05, + "loss": 1.1495, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 2.074658155441284, + "eval_runtime": 43.8288, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 4.296061038970947, + "learning_rate": 2.737777777777778e-05, + "loss": 1.1867, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 2.0737810134887695, + "eval_runtime": 43.8279, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 3.2808094024658203, + "learning_rate": 2.6192592592592597e-05, + "loss": 1.0536, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 2.1731176376342773, + "eval_runtime": 43.8381, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.035400629043579, + "learning_rate": 2.5007407407407408e-05, + "loss": 1.0091, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 2.1523854732513428, + "eval_runtime": 43.8544, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.89253830909729, + "learning_rate": 2.382222222222222e-05, + "loss": 1.0155, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 2.144387722015381, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 3.1794967651367188, + "learning_rate": 2.263703703703704e-05, + "loss": 1.0635, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 2.1576201915740967, + "eval_runtime": 43.8508, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 3.1008200645446777, + "learning_rate": 2.157037037037037e-05, + "loss": 1.0641, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 2.1437439918518066, + "eval_runtime": 43.8399, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.045117139816284, + "learning_rate": 2.038518518518519e-05, + "loss": 1.0651, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 2.149211883544922, + "eval_runtime": 43.8417, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 3.004579782485962, + "learning_rate": 1.9200000000000003e-05, + "loss": 1.1477, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 2.15095853805542, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 520 + }, + { + "epoch": 7.066666666666666, + "grad_norm": 3.0588927268981934, + "learning_rate": 1.8014814814814817e-05, + "loss": 1.1034, + "step": 530 + }, + { + "epoch": 7.066666666666666, + "eval_loss": 2.1752231121063232, + "eval_runtime": 43.8223, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 530 + }, + { + "epoch": 7.2, + "grad_norm": 3.6574132442474365, + "learning_rate": 1.682962962962963e-05, + "loss": 0.9523, + "step": 540 + }, + { + "epoch": 7.2, + "eval_loss": 2.224954128265381, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 540 + }, + { + "epoch": 7.333333333333333, + "grad_norm": 2.6502890586853027, + "learning_rate": 1.5644444444444448e-05, + "loss": 0.9977, + "step": 550 + }, + { + "epoch": 7.333333333333333, + "eval_loss": 2.2001636028289795, + "eval_runtime": 43.8171, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 550 + }, + { + "epoch": 7.466666666666667, + "grad_norm": 3.5509214401245117, + "learning_rate": 1.445925925925926e-05, + "loss": 1.0191, + "step": 560 + }, + { + "epoch": 7.466666666666667, + "eval_loss": 2.206838846206665, + "eval_runtime": 43.8328, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 560 + }, + { + "epoch": 7.6, + "grad_norm": 3.4937808513641357, + "learning_rate": 1.3274074074074074e-05, + "loss": 0.9302, + "step": 570 + }, + { + "epoch": 7.6, + "eval_loss": 2.2026851177215576, + "eval_runtime": 43.8326, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 570 + }, + { + "epoch": 7.733333333333333, + "grad_norm": 3.4799985885620117, + "learning_rate": 1.208888888888889e-05, + "loss": 1.0328, + "step": 580 + }, + { + "epoch": 7.733333333333333, + "eval_loss": 2.202017068862915, + "eval_runtime": 43.8401, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 580 + }, + { + "epoch": 7.866666666666667, + "grad_norm": 3.961993932723999, + "learning_rate": 1.0903703703703706e-05, + "loss": 1.0468, + "step": 590 + }, + { + "epoch": 7.866666666666667, + "eval_loss": 2.2057199478149414, + "eval_runtime": 43.8449, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 590 + }, + { + "epoch": 8.0, + "grad_norm": 3.0407888889312744, + "learning_rate": 9.837037037037038e-06, + "loss": 1.0029, + "step": 600 + }, + { + "epoch": 8.0, + "eval_loss": 2.2097363471984863, + "eval_runtime": 43.8387, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 600 + }, + { + "epoch": 8.133333333333333, + "grad_norm": 3.244175910949707, + "learning_rate": 8.651851851851852e-06, + "loss": 0.9522, + "step": 610 + }, + { + "epoch": 8.133333333333333, + "eval_loss": 2.2325267791748047, + "eval_runtime": 43.8391, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 610 + }, + { + "epoch": 8.266666666666667, + "grad_norm": 4.19291353225708, + "learning_rate": 7.4666666666666675e-06, + "loss": 0.9562, + "step": 620 + }, + { + "epoch": 8.266666666666667, + "eval_loss": 2.2521543502807617, + "eval_runtime": 43.8293, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 620 + }, + { + "epoch": 8.4, + "grad_norm": 2.8447179794311523, + "learning_rate": 6.2814814814814814e-06, + "loss": 1.041, + "step": 630 + }, + { + "epoch": 8.4, + "eval_loss": 2.246365785598755, + "eval_runtime": 43.8554, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 630 + }, + { + "epoch": 8.533333333333333, + "grad_norm": 3.6524436473846436, + "learning_rate": 5.096296296296297e-06, + "loss": 0.9336, + "step": 640 + }, + { + "epoch": 8.533333333333333, + "eval_loss": 2.2426538467407227, + "eval_runtime": 43.8259, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 640 + }, + { + "epoch": 8.666666666666666, + "grad_norm": 3.7157490253448486, + "learning_rate": 3.911111111111112e-06, + "loss": 0.9294, + "step": 650 + }, + { + "epoch": 8.666666666666666, + "eval_loss": 2.24678897857666, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 650 + }, + { + "epoch": 8.8, + "grad_norm": 3.1643199920654297, + "learning_rate": 2.7259259259259264e-06, + "loss": 0.948, + "step": 660 + }, + { + "epoch": 8.8, + "eval_loss": 2.249847173690796, + "eval_runtime": 43.8277, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 660 + }, + { + "epoch": 8.933333333333334, + "grad_norm": 3.1833741664886475, + "learning_rate": 1.540740740740741e-06, + "loss": 0.879, + "step": 670 + }, + { + "epoch": 8.933333333333334, + "eval_loss": 2.249255895614624, + "eval_runtime": 43.8327, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 670 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 1.097866823073792e+17, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-670/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..4bd477641949ec41e4a76667f07ab1a8c679e088 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:4392c7b9cc258e0048910f0efda48e2e5286401048e6e1ec93cee383cd6e1bc3 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..0420cddfdb7d1f3fa991bcc7fe549a4229c60edc --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:00cff1e41b05ec7fa2d4d3b32667eb4fccdc8867b55738a3d27c007a04784d1c +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..bb61823d0d78956427b74dd1a3fc741ba1b2381f --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:c44717b587bf877ea1a37c7f5747a93e45e34ce231c845a31a9b8a042ee22593 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..da5a596d4798dd8cccd4411b87defd20b9e34e90 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:8943e15a1a78fecf3dd0b6c82ef3472edb01aa3fe4391a40fa171a8e18e18fd1 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..24f54dc250ea1b5da535d02251e6bc347c508ffc --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/trainer_state.json @@ -0,0 +1,1038 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 9.0, + "eval_steps": 10, + "global_step": 675, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + }, + { + "epoch": 1.3333333333333333, + "grad_norm": 0.5338501930236816, + "learning_rate": 6.826666666666668e-05, + "loss": 1.5858, + "step": 100 + }, + { + "epoch": 1.3333333333333333, + "eval_loss": 1.7733956575393677, + "eval_runtime": 43.8348, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 100 + }, + { + "epoch": 1.4666666666666668, + "grad_norm": 0.5755787491798401, + "learning_rate": 6.720000000000001e-05, + "loss": 1.6722, + "step": 110 + }, + { + "epoch": 1.4666666666666668, + "eval_loss": 1.7754428386688232, + "eval_runtime": 43.8506, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 110 + }, + { + "epoch": 1.6, + "grad_norm": 0.5732030868530273, + "learning_rate": 6.601481481481482e-05, + "loss": 1.7101, + "step": 120 + }, + { + "epoch": 1.6, + "eval_loss": 1.7777600288391113, + "eval_runtime": 43.8177, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 120 + }, + { + "epoch": 1.7333333333333334, + "grad_norm": 0.6674457788467407, + "learning_rate": 6.482962962962964e-05, + "loss": 1.6537, + "step": 130 + }, + { + "epoch": 1.7333333333333334, + "eval_loss": 1.77739679813385, + "eval_runtime": 43.8286, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 130 + }, + { + "epoch": 1.8666666666666667, + "grad_norm": 0.7166047096252441, + "learning_rate": 6.364444444444445e-05, + "loss": 1.7626, + "step": 140 + }, + { + "epoch": 1.8666666666666667, + "eval_loss": 1.7785311937332153, + "eval_runtime": 43.8238, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 140 + }, + { + "epoch": 2.0, + "grad_norm": 0.7089148163795471, + "learning_rate": 6.245925925925926e-05, + "loss": 1.6144, + "step": 150 + }, + { + "epoch": 2.0, + "eval_loss": 1.7784926891326904, + "eval_runtime": 43.8272, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 150 + }, + { + "epoch": 2.1333333333333333, + "grad_norm": 0.8039184808731079, + "learning_rate": 6.127407407407407e-05, + "loss": 1.6737, + "step": 160 + }, + { + "epoch": 2.1333333333333333, + "eval_loss": 1.7979527711868286, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 160 + }, + { + "epoch": 2.2666666666666666, + "grad_norm": 1.226765513420105, + "learning_rate": 6.008888888888889e-05, + "loss": 1.5733, + "step": 170 + }, + { + "epoch": 2.2666666666666666, + "eval_loss": 1.8257368803024292, + "eval_runtime": 43.8351, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 170 + }, + { + "epoch": 2.4, + "grad_norm": 1.0805286169052124, + "learning_rate": 5.902222222222222e-05, + "loss": 1.4639, + "step": 180 + }, + { + "epoch": 2.4, + "eval_loss": 1.8204447031021118, + "eval_runtime": 43.8307, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 180 + }, + { + "epoch": 2.533333333333333, + "grad_norm": 1.2623231410980225, + "learning_rate": 5.783703703703704e-05, + "loss": 1.6045, + "step": 190 + }, + { + "epoch": 2.533333333333333, + "eval_loss": 1.8204751014709473, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 190 + }, + { + "epoch": 2.6666666666666665, + "grad_norm": 1.3067922592163086, + "learning_rate": 5.665185185185186e-05, + "loss": 1.5462, + "step": 200 + }, + { + "epoch": 2.6666666666666665, + "eval_loss": 1.828981876373291, + "eval_runtime": 43.8239, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 200 + }, + { + "epoch": 2.8, + "grad_norm": 1.2507699728012085, + "learning_rate": 5.5466666666666675e-05, + "loss": 1.4641, + "step": 210 + }, + { + "epoch": 2.8, + "eval_loss": 1.8268734216690063, + "eval_runtime": 43.8282, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 210 + }, + { + "epoch": 2.9333333333333336, + "grad_norm": 1.2686301469802856, + "learning_rate": 5.4281481481481486e-05, + "loss": 1.5211, + "step": 220 + }, + { + "epoch": 2.9333333333333336, + "eval_loss": 1.8283354043960571, + "eval_runtime": 43.8292, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 220 + }, + { + "epoch": 3.066666666666667, + "grad_norm": 1.0655094385147095, + "learning_rate": 5.30962962962963e-05, + "loss": 1.4441, + "step": 230 + }, + { + "epoch": 3.066666666666667, + "eval_loss": 1.847070574760437, + "eval_runtime": 43.8372, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 230 + }, + { + "epoch": 3.2, + "grad_norm": 1.761847734451294, + "learning_rate": 5.1911111111111114e-05, + "loss": 1.4666, + "step": 240 + }, + { + "epoch": 3.2, + "eval_loss": 1.9219939708709717, + "eval_runtime": 43.8375, + "eval_samples_per_second": 22.812, + "eval_steps_per_second": 2.851, + "step": 240 + }, + { + "epoch": 3.3333333333333335, + "grad_norm": 1.8005529642105103, + "learning_rate": 5.072592592592593e-05, + "loss": 1.3647, + "step": 250 + }, + { + "epoch": 3.3333333333333335, + "eval_loss": 1.9093294143676758, + "eval_runtime": 43.8196, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 250 + }, + { + "epoch": 3.466666666666667, + "grad_norm": 1.670900821685791, + "learning_rate": 4.954074074074075e-05, + "loss": 1.3373, + "step": 260 + }, + { + "epoch": 3.466666666666667, + "eval_loss": 1.9034240245819092, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 260 + }, + { + "epoch": 3.6, + "grad_norm": 1.8329880237579346, + "learning_rate": 4.847407407407408e-05, + "loss": 1.3594, + "step": 270 + }, + { + "epoch": 3.6, + "eval_loss": 1.9157416820526123, + "eval_runtime": 43.8462, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 270 + }, + { + "epoch": 3.7333333333333334, + "grad_norm": 1.7573440074920654, + "learning_rate": 4.72888888888889e-05, + "loss": 1.4565, + "step": 280 + }, + { + "epoch": 3.7333333333333334, + "eval_loss": 1.9069511890411377, + "eval_runtime": 43.8228, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 280 + }, + { + "epoch": 3.8666666666666667, + "grad_norm": 2.1760241985321045, + "learning_rate": 4.610370370370371e-05, + "loss": 1.426, + "step": 290 + }, + { + "epoch": 3.8666666666666667, + "eval_loss": 1.911515235900879, + "eval_runtime": 43.8287, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 290 + }, + { + "epoch": 4.0, + "grad_norm": 1.9050216674804688, + "learning_rate": 4.491851851851852e-05, + "loss": 1.3027, + "step": 300 + }, + { + "epoch": 4.0, + "eval_loss": 1.9081496000289917, + "eval_runtime": 43.8176, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 300 + }, + { + "epoch": 4.133333333333334, + "grad_norm": 2.163496971130371, + "learning_rate": 4.373333333333334e-05, + "loss": 1.1819, + "step": 310 + }, + { + "epoch": 4.133333333333334, + "eval_loss": 2.0205495357513428, + "eval_runtime": 43.8308, + "eval_samples_per_second": 22.815, + "eval_steps_per_second": 2.852, + "step": 310 + }, + { + "epoch": 4.266666666666667, + "grad_norm": 2.2136807441711426, + "learning_rate": 4.266666666666667e-05, + "loss": 1.1971, + "step": 320 + }, + { + "epoch": 4.266666666666667, + "eval_loss": 2.001232624053955, + "eval_runtime": 43.8321, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 320 + }, + { + "epoch": 4.4, + "grad_norm": 1.9437283277511597, + "learning_rate": 4.148148148148148e-05, + "loss": 1.2353, + "step": 330 + }, + { + "epoch": 4.4, + "eval_loss": 2.001880407333374, + "eval_runtime": 43.8146, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 330 + }, + { + "epoch": 4.533333333333333, + "grad_norm": 2.2799439430236816, + "learning_rate": 4.02962962962963e-05, + "loss": 1.2511, + "step": 340 + }, + { + "epoch": 4.533333333333333, + "eval_loss": 2.0053529739379883, + "eval_runtime": 43.8144, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 340 + }, + { + "epoch": 4.666666666666667, + "grad_norm": 2.02165150642395, + "learning_rate": 3.9111111111111115e-05, + "loss": 1.2991, + "step": 350 + }, + { + "epoch": 4.666666666666667, + "eval_loss": 1.9998481273651123, + "eval_runtime": 43.8195, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 350 + }, + { + "epoch": 4.8, + "grad_norm": 2.198478937149048, + "learning_rate": 3.792592592592593e-05, + "loss": 1.2877, + "step": 360 + }, + { + "epoch": 4.8, + "eval_loss": 1.9931644201278687, + "eval_runtime": 43.8461, + "eval_samples_per_second": 22.807, + "eval_steps_per_second": 2.851, + "step": 360 + }, + { + "epoch": 4.933333333333334, + "grad_norm": 2.3883018493652344, + "learning_rate": 3.674074074074074e-05, + "loss": 1.3245, + "step": 370 + }, + { + "epoch": 4.933333333333334, + "eval_loss": 1.9867770671844482, + "eval_runtime": 43.8496, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 370 + }, + { + "epoch": 5.066666666666666, + "grad_norm": 2.347261667251587, + "learning_rate": 3.555555555555555e-05, + "loss": 1.2478, + "step": 380 + }, + { + "epoch": 5.066666666666666, + "eval_loss": 2.0367517471313477, + "eval_runtime": 43.8491, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 380 + }, + { + "epoch": 5.2, + "grad_norm": 2.6627748012542725, + "learning_rate": 3.437037037037037e-05, + "loss": 1.1202, + "step": 390 + }, + { + "epoch": 5.2, + "eval_loss": 2.103827714920044, + "eval_runtime": 43.8256, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 390 + }, + { + "epoch": 5.333333333333333, + "grad_norm": 2.5760111808776855, + "learning_rate": 3.318518518518519e-05, + "loss": 1.0481, + "step": 400 + }, + { + "epoch": 5.333333333333333, + "eval_loss": 2.075502872467041, + "eval_runtime": 43.8158, + "eval_samples_per_second": 22.823, + "eval_steps_per_second": 2.853, + "step": 400 + }, + { + "epoch": 5.466666666666667, + "grad_norm": 3.0876505374908447, + "learning_rate": 3.211851851851852e-05, + "loss": 1.1515, + "step": 410 + }, + { + "epoch": 5.466666666666667, + "eval_loss": 2.0795600414276123, + "eval_runtime": 43.8222, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 410 + }, + { + "epoch": 5.6, + "grad_norm": 2.520632028579712, + "learning_rate": 3.093333333333334e-05, + "loss": 1.2198, + "step": 420 + }, + { + "epoch": 5.6, + "eval_loss": 2.0835771560668945, + "eval_runtime": 43.8209, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.853, + "step": 420 + }, + { + "epoch": 5.733333333333333, + "grad_norm": 2.776945114135742, + "learning_rate": 2.974814814814815e-05, + "loss": 1.1505, + "step": 430 + }, + { + "epoch": 5.733333333333333, + "eval_loss": 2.07682204246521, + "eval_runtime": 43.8225, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 430 + }, + { + "epoch": 5.866666666666667, + "grad_norm": 4.1550798416137695, + "learning_rate": 2.8562962962962966e-05, + "loss": 1.1495, + "step": 440 + }, + { + "epoch": 5.866666666666667, + "eval_loss": 2.074658155441284, + "eval_runtime": 43.8288, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 440 + }, + { + "epoch": 6.0, + "grad_norm": 4.296061038970947, + "learning_rate": 2.737777777777778e-05, + "loss": 1.1867, + "step": 450 + }, + { + "epoch": 6.0, + "eval_loss": 2.0737810134887695, + "eval_runtime": 43.8279, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 450 + }, + { + "epoch": 6.133333333333334, + "grad_norm": 3.2808094024658203, + "learning_rate": 2.6192592592592597e-05, + "loss": 1.0536, + "step": 460 + }, + { + "epoch": 6.133333333333334, + "eval_loss": 2.1731176376342773, + "eval_runtime": 43.8381, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 460 + }, + { + "epoch": 6.266666666666667, + "grad_norm": 3.035400629043579, + "learning_rate": 2.5007407407407408e-05, + "loss": 1.0091, + "step": 470 + }, + { + "epoch": 6.266666666666667, + "eval_loss": 2.1523854732513428, + "eval_runtime": 43.8544, + "eval_samples_per_second": 22.803, + "eval_steps_per_second": 2.85, + "step": 470 + }, + { + "epoch": 6.4, + "grad_norm": 2.89253830909729, + "learning_rate": 2.382222222222222e-05, + "loss": 1.0155, + "step": 480 + }, + { + "epoch": 6.4, + "eval_loss": 2.144387722015381, + "eval_runtime": 43.8398, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 480 + }, + { + "epoch": 6.533333333333333, + "grad_norm": 3.1794967651367188, + "learning_rate": 2.263703703703704e-05, + "loss": 1.0635, + "step": 490 + }, + { + "epoch": 6.533333333333333, + "eval_loss": 2.1576201915740967, + "eval_runtime": 43.8508, + "eval_samples_per_second": 22.805, + "eval_steps_per_second": 2.851, + "step": 490 + }, + { + "epoch": 6.666666666666667, + "grad_norm": 3.1008200645446777, + "learning_rate": 2.157037037037037e-05, + "loss": 1.0641, + "step": 500 + }, + { + "epoch": 6.666666666666667, + "eval_loss": 2.1437439918518066, + "eval_runtime": 43.8399, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 500 + }, + { + "epoch": 6.8, + "grad_norm": 3.045117139816284, + "learning_rate": 2.038518518518519e-05, + "loss": 1.0651, + "step": 510 + }, + { + "epoch": 6.8, + "eval_loss": 2.149211883544922, + "eval_runtime": 43.8417, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 510 + }, + { + "epoch": 6.933333333333334, + "grad_norm": 3.004579782485962, + "learning_rate": 1.9200000000000003e-05, + "loss": 1.1477, + "step": 520 + }, + { + "epoch": 6.933333333333334, + "eval_loss": 2.15095853805542, + "eval_runtime": 43.8347, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 520 + }, + { + "epoch": 7.066666666666666, + "grad_norm": 3.0588927268981934, + "learning_rate": 1.8014814814814817e-05, + "loss": 1.1034, + "step": 530 + }, + { + "epoch": 7.066666666666666, + "eval_loss": 2.1752231121063232, + "eval_runtime": 43.8223, + "eval_samples_per_second": 22.819, + "eval_steps_per_second": 2.852, + "step": 530 + }, + { + "epoch": 7.2, + "grad_norm": 3.6574132442474365, + "learning_rate": 1.682962962962963e-05, + "loss": 0.9523, + "step": 540 + }, + { + "epoch": 7.2, + "eval_loss": 2.224954128265381, + "eval_runtime": 43.8341, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 540 + }, + { + "epoch": 7.333333333333333, + "grad_norm": 2.6502890586853027, + "learning_rate": 1.5644444444444448e-05, + "loss": 0.9977, + "step": 550 + }, + { + "epoch": 7.333333333333333, + "eval_loss": 2.2001636028289795, + "eval_runtime": 43.8171, + "eval_samples_per_second": 22.822, + "eval_steps_per_second": 2.853, + "step": 550 + }, + { + "epoch": 7.466666666666667, + "grad_norm": 3.5509214401245117, + "learning_rate": 1.445925925925926e-05, + "loss": 1.0191, + "step": 560 + }, + { + "epoch": 7.466666666666667, + "eval_loss": 2.206838846206665, + "eval_runtime": 43.8328, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 560 + }, + { + "epoch": 7.6, + "grad_norm": 3.4937808513641357, + "learning_rate": 1.3274074074074074e-05, + "loss": 0.9302, + "step": 570 + }, + { + "epoch": 7.6, + "eval_loss": 2.2026851177215576, + "eval_runtime": 43.8326, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 570 + }, + { + "epoch": 7.733333333333333, + "grad_norm": 3.4799985885620117, + "learning_rate": 1.208888888888889e-05, + "loss": 1.0328, + "step": 580 + }, + { + "epoch": 7.733333333333333, + "eval_loss": 2.202017068862915, + "eval_runtime": 43.8401, + "eval_samples_per_second": 22.81, + "eval_steps_per_second": 2.851, + "step": 580 + }, + { + "epoch": 7.866666666666667, + "grad_norm": 3.961993932723999, + "learning_rate": 1.0903703703703706e-05, + "loss": 1.0468, + "step": 590 + }, + { + "epoch": 7.866666666666667, + "eval_loss": 2.2057199478149414, + "eval_runtime": 43.8449, + "eval_samples_per_second": 22.808, + "eval_steps_per_second": 2.851, + "step": 590 + }, + { + "epoch": 8.0, + "grad_norm": 3.0407888889312744, + "learning_rate": 9.837037037037038e-06, + "loss": 1.0029, + "step": 600 + }, + { + "epoch": 8.0, + "eval_loss": 2.2097363471984863, + "eval_runtime": 43.8387, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 600 + }, + { + "epoch": 8.133333333333333, + "grad_norm": 3.244175910949707, + "learning_rate": 8.651851851851852e-06, + "loss": 0.9522, + "step": 610 + }, + { + "epoch": 8.133333333333333, + "eval_loss": 2.2325267791748047, + "eval_runtime": 43.8391, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 610 + }, + { + "epoch": 8.266666666666667, + "grad_norm": 4.19291353225708, + "learning_rate": 7.4666666666666675e-06, + "loss": 0.9562, + "step": 620 + }, + { + "epoch": 8.266666666666667, + "eval_loss": 2.2521543502807617, + "eval_runtime": 43.8293, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 620 + }, + { + "epoch": 8.4, + "grad_norm": 2.8447179794311523, + "learning_rate": 6.2814814814814814e-06, + "loss": 1.041, + "step": 630 + }, + { + "epoch": 8.4, + "eval_loss": 2.246365785598755, + "eval_runtime": 43.8554, + "eval_samples_per_second": 22.802, + "eval_steps_per_second": 2.85, + "step": 630 + }, + { + "epoch": 8.533333333333333, + "grad_norm": 3.6524436473846436, + "learning_rate": 5.096296296296297e-06, + "loss": 0.9336, + "step": 640 + }, + { + "epoch": 8.533333333333333, + "eval_loss": 2.2426538467407227, + "eval_runtime": 43.8259, + "eval_samples_per_second": 22.818, + "eval_steps_per_second": 2.852, + "step": 640 + }, + { + "epoch": 8.666666666666666, + "grad_norm": 3.7157490253448486, + "learning_rate": 3.911111111111112e-06, + "loss": 0.9294, + "step": 650 + }, + { + "epoch": 8.666666666666666, + "eval_loss": 2.24678897857666, + "eval_runtime": 43.8423, + "eval_samples_per_second": 22.809, + "eval_steps_per_second": 2.851, + "step": 650 + }, + { + "epoch": 8.8, + "grad_norm": 3.1643199920654297, + "learning_rate": 2.7259259259259264e-06, + "loss": 0.948, + "step": 660 + }, + { + "epoch": 8.8, + "eval_loss": 2.249847173690796, + "eval_runtime": 43.8277, + "eval_samples_per_second": 22.817, + "eval_steps_per_second": 2.852, + "step": 660 + }, + { + "epoch": 8.933333333333334, + "grad_norm": 3.1833741664886475, + "learning_rate": 1.540740740740741e-06, + "loss": 0.879, + "step": 670 + }, + { + "epoch": 8.933333333333334, + "eval_loss": 2.249255895614624, + "eval_runtime": 43.8327, + "eval_samples_per_second": 22.814, + "eval_steps_per_second": 2.852, + "step": 670 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": true + }, + "attributes": {} + } + }, + "total_flos": 1.10605985906688e+17, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-675/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..9c4a7acf33dc93fda66004db6a1cdf14349c88d6 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:b453cdfd2520c03d2d81ae0b1ad66085228f18452a550dfbbb3f0f2842c3bea3 +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..47fd2b91797167ef32bbeedde4b2522cd3cd7e85 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:9edec5cfafd82d02c57be8172f60b500f51f201da7f44cdf1662e79e9fcd4be9 +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..2b1c959e3b92a9d3847cd61e595c79a1813cfe3a --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0bf8faccd3d2ca94b80304c3092e394e13d076f35c0c4f51d74490ac3412d5f9 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..4e0b845352c058c10456242d7048575bb3ed9ed9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ea49100ba0a4f3150de9cb995c7912874f39ba5fc6da892eafe82565fe347b82 +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..df6d3b5987de0af66fc830e0b11dac9e8492afdd --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/trainer_state.json @@ -0,0 +1,138 @@ +{ + "best_metric": 1.768637776374817, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70", + "epoch": 0.9333333333333333, + "eval_steps": 10, + "global_step": 70, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 1.14702503903232e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-70/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..ec965a973415021f2c652fbdee8ba9d28c64d019 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:dab4ab030ae70054acec18175d73c141430964d6cd8fd46193235211594d565b +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..e2cd42e541096540c1d9022950c4743848354cf6 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:09294453d0e6465052598c1a57dfd15d258a58ebc1dd7c7b05584edb31b809cb +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..0b228b8e8106f666fe286c5d131d496d926a7df4 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:debbe8bbbf3d0dfd719072ab48974c332b6f78ebe25ef99f5002c8d0a8c8c380 +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..1248c444261af6164ab0c09bf0d8ff5b4162cf1c --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:ca68de17dd1fdc64e6f79214024f511b4e51a08963f5ee15b680160138e4c50d +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..ea6cdc9c4e288affdb28c23a767d2b054dc5ce26 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/trainer_state.json @@ -0,0 +1,153 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 1.0666666666666667, + "eval_steps": 10, + "global_step": 80, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 1.31088575889408e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/README.md b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e255b1d99c1c1d12955d852dc1056813be7ffca0 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/README.md @@ -0,0 +1,202 @@ +--- +base_model: /workspace/pythia-6_9b +library_name: peft +--- + +# Model Card for Model ID + + + + + +## Model Details + +### Model Description + + + + + +- **Developed by:** [More Information Needed] +- **Funded by [optional]:** [More Information Needed] +- **Shared by [optional]:** [More Information Needed] +- **Model type:** [More Information Needed] +- **Language(s) (NLP):** [More Information Needed] +- **License:** [More Information Needed] +- **Finetuned from model [optional]:** [More Information Needed] + +### Model Sources [optional] + + + +- **Repository:** [More Information Needed] +- **Paper [optional]:** [More Information Needed] +- **Demo [optional]:** [More Information Needed] + +## Uses + + + +### Direct Use + + + +[More Information Needed] + +### Downstream Use [optional] + + + +[More Information Needed] + +### Out-of-Scope Use + + + +[More Information Needed] + +## Bias, Risks, and Limitations + + + +[More Information Needed] + +### Recommendations + + + +Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. + +## How to Get Started with the Model + +Use the code below to get started with the model. + +[More Information Needed] + +## Training Details + +### Training Data + + + +[More Information Needed] + +### Training Procedure + + + +#### Preprocessing [optional] + +[More Information Needed] + + +#### Training Hyperparameters + +- **Training regime:** [More Information Needed] + +#### Speeds, Sizes, Times [optional] + + + +[More Information Needed] + +## Evaluation + + + +### Testing Data, Factors & Metrics + +#### Testing Data + + + +[More Information Needed] + +#### Factors + + + +[More Information Needed] + +#### Metrics + + + +[More Information Needed] + +### Results + +[More Information Needed] + +#### Summary + + + +## Model Examination [optional] + + + +[More Information Needed] + +## Environmental Impact + + + +Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). + +- **Hardware Type:** [More Information Needed] +- **Hours used:** [More Information Needed] +- **Cloud Provider:** [More Information Needed] +- **Compute Region:** [More Information Needed] +- **Carbon Emitted:** [More Information Needed] + +## Technical Specifications [optional] + +### Model Architecture and Objective + +[More Information Needed] + +### Compute Infrastructure + +[More Information Needed] + +#### Hardware + +[More Information Needed] + +#### Software + +[More Information Needed] + +## Citation [optional] + + + +**BibTeX:** + +[More Information Needed] + +**APA:** + +[More Information Needed] + +## Glossary [optional] + + + +[More Information Needed] + +## More Information [optional] + +[More Information Needed] + +## Model Card Authors [optional] + +[More Information Needed] + +## Model Card Contact + +[More Information Needed] +### Framework versions + +- PEFT 0.13.2 \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/adapter_config.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/adapter_config.json new file mode 100644 index 0000000000000000000000000000000000000000..2dac7d45378cb5fa31de4db4886fb4f63ba5fcc9 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/adapter_config.json @@ -0,0 +1,31 @@ +{ + "alpha_pattern": {}, + "auto_mapping": null, + "base_model_name_or_path": "/workspace/pythia-6_9b", + "bias": "none", + "fan_in_fan_out": false, + "inference_mode": true, + "init_lora_weights": true, + "layer_replication": null, + "layers_pattern": null, + "layers_to_transform": null, + "loftq_config": {}, + "lora_alpha": 32, + "lora_dropout": 0.1, + "megatron_config": null, + "megatron_core": "megatron.core", + "modules_to_save": null, + "peft_type": "LORA", + "r": 8, + "rank_pattern": {}, + "revision": null, + "target_modules": [ + "dense", + "dense_4h_to_h", + "query_key_value", + "dense_h_to_4h" + ], + "task_type": "CAUSAL_LM", + "use_dora": false, + "use_rslora": false +} \ No newline at end of file diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/adapter_model.safetensors b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/adapter_model.safetensors new file mode 100644 index 0000000000000000000000000000000000000000..6db7a49443f5e1e4b5907ec41352c765d808f4cc --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/adapter_model.safetensors @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a519e556d82571d4e2ed1a22f960372eb355512671b47b38dcfb88b04226a61d +size 67144544 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/optimizer.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/optimizer.pt new file mode 100644 index 0000000000000000000000000000000000000000..f32575e938b2d8bb300ccaa9331ee37fdb491bb6 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/optimizer.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:f0df38cb172b00e6dfefa6ddba9daf102ee703737e53dffbcb8bd5c091f417cb +size 134432453 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/rng_state.pth b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/rng_state.pth new file mode 100644 index 0000000000000000000000000000000000000000..4041231f7cc289aaec627b941b3ce1ed104a3678 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/rng_state.pth @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:5e1884689751e2c9aa53b83d7472089621e5727e27a037b479e2287c7b208b1a +size 14575 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/scheduler.pt b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/scheduler.pt new file mode 100644 index 0000000000000000000000000000000000000000..0b0f75e4466fe4f0b483cc75680b88de64a7d8b3 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/scheduler.pt @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:fb50ab812a3208851daa78a03dc3881c17f4294549fbb62565d478b4ff3eccbf +size 627 diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/trainer_state.json b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/trainer_state.json new file mode 100644 index 0000000000000000000000000000000000000000..96eab89fa7d433893eb22729ede48e7493233cd8 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/trainer_state.json @@ -0,0 +1,168 @@ +{ + "best_metric": 1.7681938409805298, + "best_model_checkpoint": "./output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-80", + "epoch": 1.2, + "eval_steps": 10, + "global_step": 90, + "is_hyper_param_search": false, + "is_local_process_zero": true, + "is_world_process_zero": true, + "log_history": [ + { + "epoch": 0.13333333333333333, + "grad_norm": 0.43416136503219604, + "learning_rate": 7.881481481481482e-05, + "loss": 1.8299, + "step": 10 + }, + { + "epoch": 0.13333333333333333, + "eval_loss": 1.8035988807678223, + "eval_runtime": 43.7951, + "eval_samples_per_second": 22.834, + "eval_steps_per_second": 2.854, + "step": 10 + }, + { + "epoch": 0.26666666666666666, + "grad_norm": 0.4365692734718323, + "learning_rate": 7.774814814814816e-05, + "loss": 1.6959, + "step": 20 + }, + { + "epoch": 0.26666666666666666, + "eval_loss": 1.781802773475647, + "eval_runtime": 43.8289, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 20 + }, + { + "epoch": 0.4, + "grad_norm": 0.4519215226173401, + "learning_rate": 7.656296296296297e-05, + "loss": 1.8596, + "step": 30 + }, + { + "epoch": 0.4, + "eval_loss": 1.774838924407959, + "eval_runtime": 43.8143, + "eval_samples_per_second": 22.824, + "eval_steps_per_second": 2.853, + "step": 30 + }, + { + "epoch": 0.5333333333333333, + "grad_norm": 0.4022461473941803, + "learning_rate": 7.537777777777778e-05, + "loss": 1.7206, + "step": 40 + }, + { + "epoch": 0.5333333333333333, + "eval_loss": 1.773160457611084, + "eval_runtime": 43.8283, + "eval_samples_per_second": 22.816, + "eval_steps_per_second": 2.852, + "step": 40 + }, + { + "epoch": 0.6666666666666666, + "grad_norm": 0.38469013571739197, + "learning_rate": 7.41925925925926e-05, + "loss": 1.6842, + "step": 50 + }, + { + "epoch": 0.6666666666666666, + "eval_loss": 1.7716821432113647, + "eval_runtime": 43.8125, + "eval_samples_per_second": 22.825, + "eval_steps_per_second": 2.853, + "step": 50 + }, + { + "epoch": 0.8, + "grad_norm": 0.4017268717288971, + "learning_rate": 7.300740740740741e-05, + "loss": 1.6581, + "step": 60 + }, + { + "epoch": 0.8, + "eval_loss": 1.7701489925384521, + "eval_runtime": 43.8384, + "eval_samples_per_second": 22.811, + "eval_steps_per_second": 2.851, + "step": 60 + }, + { + "epoch": 0.9333333333333333, + "grad_norm": 0.3455270230770111, + "learning_rate": 7.182222222222222e-05, + "loss": 1.7123, + "step": 70 + }, + { + "epoch": 0.9333333333333333, + "eval_loss": 1.768637776374817, + "eval_runtime": 43.8188, + "eval_samples_per_second": 22.821, + "eval_steps_per_second": 2.853, + "step": 70 + }, + { + "epoch": 1.0666666666666667, + "grad_norm": 0.3520221710205078, + "learning_rate": 7.063703703703705e-05, + "loss": 1.7879, + "step": 80 + }, + { + "epoch": 1.0666666666666667, + "eval_loss": 1.7681938409805298, + "eval_runtime": 43.8213, + "eval_samples_per_second": 22.82, + "eval_steps_per_second": 2.852, + "step": 80 + }, + { + "epoch": 1.2, + "grad_norm": 0.43097180128097534, + "learning_rate": 6.945185185185186e-05, + "loss": 1.6107, + "step": 90 + }, + { + "epoch": 1.2, + "eval_loss": 1.7710144519805908, + "eval_runtime": 43.834, + "eval_samples_per_second": 22.813, + "eval_steps_per_second": 2.852, + "step": 90 + } + ], + "logging_steps": 10, + "max_steps": 675, + "num_input_tokens_seen": 0, + "num_train_epochs": 9, + "save_steps": 10, + "stateful_callbacks": { + "TrainerControl": { + "args": { + "should_epoch_stop": false, + "should_evaluate": false, + "should_log": false, + "should_save": true, + "should_training_stop": false + }, + "attributes": {} + } + }, + "total_flos": 1.47474647875584e+16, + "train_batch_size": 8, + "trial_name": null, + "trial_params": null +} diff --git a/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/training_args.bin b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/training_args.bin new file mode 100644 index 0000000000000000000000000000000000000000..d2520f43dac84301e9f60ec94c3299298bd6c378 --- /dev/null +++ b/output_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-nonmember-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05/checkpoint-90/training_args.bin @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:2147d51347048dde2ede720ba0c3e7bd0ac3ee83ad3478587160144bc2f60501 +size 4859 diff --git a/responses_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05-ft.jsonl b/responses_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05-ft.jsonl new file mode 100644 index 0000000000000000000000000000000000000000..776cd4bdc0fab6e1d5f0ff6f1cd43e2da9f60a3e --- /dev/null +++ b/responses_ft_more_layers_stackexchange_epoch_9_mlp/pythia-6_9b-member-6_9b-epoch-9-pile-full-600-subsets-stackexchange-8e-05-ft.jsonl @@ -0,0 +1,1000 @@ +{"output_text": "}\n\n /**\n * Set Address\n *\n * @param string $address\n * @return UserAddress\n */\n public function setAddress($address)\n {\n $this->Address = $address;\n\n return $this;\n }\n\n /**\n * Get Address\n *\n * @return string \n */\n public function getAddress()\n {\n return $this->Address;\n", "input_text": " $UserAddressType;\n\n /**\n * Get ID\n *\n * @return integer \n */\n public function getID()\n {\n return $this->ID;\n }\n\n /**\n * Set UserId\n *\n * @param integer $userId\n * @return UserAddress\n */\n public function setUserId($userId)\n {\n $this->UserId = $userId;\n\n return $this;\n }\n\n /**\n * Get UserId\n *\n * @return integer \n */\n public function getUserId()\n {\n return $this->UserId;\n }\n\n /**\n * Set LevelId\n *\n * @param integer $levelId\n * @return UserAddress\n */\n public function setLevelId($levelId)\n {\n $this->LevelId = $levelId;\n\n return $this;\n }\n\n /**\n * Get LevelId\n *\n * @return integer \n */\n public function getLevelId()\n {\n return $this->LevelId;\n }\n\n /**\n * Set AddressTypeId\n *\n * @param integer $addressTypeId\n * @return UserAddress\n */\n public function setAddressTypeId($addressTypeId)\n {\n $this->AddressTypeId = $addressTypeId;\n\n return $this;\n }\n\n /**\n * Get AddressTypeId\n *\n * @return integer \n */\n public function getAddressTypeId()\n {\n return $this->AddressTypeId;\n "} +{"output_text": " y : -7.985\nI/touchCheck( 1721): MeNano's ACTION_UP - x : 62.880 y : -7.985\nI/touchCheck( 1721): MeNano's ACTION_MOVE - x : 62.880 y : -7.985\nI/touchCheck( 1721): MeNano's ACTION_UP - x : 62.880 y : -7.985\nI/touchCheck", "input_text": "krkrk _sensorEnable SENSORS_ACCELERATION on aa(1)\nD/LGESensors( 1721): skykrkrk control__set_delay: dev=0x336ee8 delay-ms=200 hSen\nsor=0\nE/LockPatternKeyguardView( 1721): onScreenTurnedOn()\nE/LockPatternKeyguardView( 1721): doesFallbackUnlockScreenExist()\nE/LockPatternKeyguardView( 1721): isVerifyUnlockOnly()\nI/touchCheck( 1721): WMS's KEY event : KeyEvent{action=1 code=26 repeat=0 meta=0\n scancode=116 chars=null mFlags=8}\nI/touchCheck( 1721): ViewRoot's Key event KeyEvent{action=1 code=26 repeat=0 met\na=0 scancode=116 chars=null mFlags=8}\nD/StatusBarPolicy( 1721): updateClock(), time = 7:13 PM\nD/Andy_DigitalClock( 1721): updateTime(), time = 7:13\nD/GsmDataConnectionTracker( 1798): [GsmDataConnectionTracker] onDataStateChanged\n(): 0 data calls received\nD/Andy_DigitalClock( 1721): updateTime(), time = 7:13\nI/PowerManagerService( 1721): stepLocked animation finished mask:2 curIntValue:1\n06\nD/SurfaceFlinger( 1721): DisplayEventThread Screen about to return, flinger = 0x\n140a48\nD/SurfaceFlinger( 1721): DisplayHardwarebase::acquireScreen() screen returned\nI/touchCheck( 1721): MeNano's ACTION_DOWN - x : 62.880"} +{"output_text": " always be true.\nI've tried setting the value to true, false, true, false, true, false, true, false, true, false, true, false, true, false, true, false, true, false, true, false, true, false, true, false, true, false, true, false, true, false, true, false, true, false, true, false, true, false, true, false, true, false, true, false,", "input_text": " valid time is test for it, and make sure that test is in play.\nvalid(HH:MM) :- between(0, 24, HH), between(0, 60, MM).\n\nThis predicate affirms the validity of a time value. If an invalid time value is passed in, it will fail. Try it. Now let's plug it into our other time predicates:\ndaytime(Time) :- \n valid(Time), 6:01 @=< Time, Time @=< 21:59.\nnighttime(Time) :- \n valid(Time), \n ((22:00 @=< Time, Time @=< 24:00) ; \n (0 @=< Time, Time @=< 6:00)).\n\nThat should be the only change you need to make.\n\n Q:\n\nHow to audit logging categories in Diagnostic settings via Azure policy?\n\nI have a policy to audit when a diagnostic settings with a specific configuration for a particular Azure service does not exist. This would be applied to several services (Event Hub, Key Vault, Postgres Single Server), to ensure a pre-defined logging configuration is in place.\nThe problem I'm facing is at auditing that specific logging categories are selected in each diagnostic settings. Using Postgres as an example, which has these logging categories:\n\nPostgreSQLLogs\nQueryStoreRuntimeStatistics\nQueryStoreWaitStatistics\n\nAnd say that I want to enforce only \"PostgreSQLLogs\" is selected in my DS. Using the policy alias \"Microsoft.Insights/diagnosticSettings/logs[*].enabled\" this would imply a configuration like: [true, false, false]. So, this is the value that I should set-up in my policy to audit it for compliance. But whichever value I try in my policy will"} +{"output_text": "c:\\program files (x86)\\microsoft visual studio 14.0\\vc\\include\\xutility(1240): error C2280: attempting to reference a deleted function (union, struct, copy constructor)\n1>c:\\program files (x86)\\microsoft visual studio 14.0\\vc\\include\\xutility(1240): error C2280: attempting to reference a deleted function (union, struct, copy constructor)\n1>c:\\program files", "input_text": "\n)\n\nBuilding a JsonSchemaGenerator:\nJsonSchemaGenerator jsonSchemaGenerator = new JsonSchemaGenerator(objectMapper, config);\nClass mainClassObject =...;\nJsonNode jsonSchema = jsonSchemaGenerator.generateJsonSchema(mainClassObject);\n\n Q:\n\nC2280: attempting to reference a deleted function (union, struct, copy constructor)\n\nI have a problem with misleading error messages, when I try to compile the following minimal sample in Visual Studio 2015:\nclass Vector\n{\n float x;\n float y;\n\npublic:\n\n Vector(float x, float y) : x(x), y(y) {}\n Vector& operator = (const Vector& v) { x = v.x; y = v.y; return *this; }\n //Vector(Vector&&) = default;\n};\n\nclass Rect\n{\npublic:\n union {\n struct {\n Vector p1, p2;\n };\n\n struct {\n float p1x, p1y, p2x, p2y;\n };\n };\n\n Rect() : p1(0,0), p2(0,0) {}\n Rect(Vector& p1, Vector& p2) : p1(p1), p2(p2) {}\n\n /*Rect(const Rect&) = default;\n Rect& operator=(const Rect&) = default;\n Rect& operator=(Rect&&) = default;\n Rect(Rect&&) = default;*/\n};\n\nint main()\n{\n Rect test = Rect();\n test = Rect();\n return 0;\n}\n\nI got the following error messages:\n\n1>"} +{"output_text": "webSite mainCategory=\"Home Page\">http://www.hsbc.com/http://www.hsbc.com/http://www.hsbc.com/http://www.hsbc.com/http://www.hsbc.com", "input_text": ".29B. Net income applicable to \ncommon stockholders decreased 1% to $3.09B. Net interest income after loan \nloss provision reflects Asia segment increase of 22% to $3.86B, Net Interest \nMargin, Total -% increase of 2% to 1.67%, Net Interest Spread.\n \n \n 8 Canada Square\n \n \n LONDON\n \n E14 5HQ\n United Kingdom\n \n \n \n \n 44\n 20\n 79918888\n \n \n 44\n 20\n 79924880\n \n \n \n http://www.hsbc.com/<"} +{"output_text": "\n\n#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord, Hash, ToString, FromValue)]\nenum MyEnum {\n A,\n B,\n C,\n}\n\nimpl FromValue for MyEnum {\n fn from_value(value: Value) -> MyEnum {\n match value {\n Value::Int(i) => MyEnum::A,\n Value::Str(s)", "input_text": " the \"New England residents\" group to include Vermont and New Hampshire residents directly, instead of relying on the nested smart groups.\nEspecially don't make a smart group made up of members of at least one smart group and one static group! For reasons too esoteric to discuss, this will completely kill performance.\nConsider whether you can use an extension like CiviRules or SQL Tasks to replace the smart group. For instance, if you have a \"Major Donors\" smart group, defined as \"Everyone who ever gave $500+ in a single donation\", consider replacing it with a static group, and a CiviRule that adds donors to it when their donation exceeds your threshold.\nThe SQL command CiviCRM comes up with to determine a smart group is sometimes inefficient. Previously I wrote custom search extensions to optimize the SQL, but I've largely deprecated this approach in favor of SQL Tasks. Custom searches can still be helpful if end users need to be able to make variations on a particular query on their own.\n\nHopefully this will give you tools to best find and fix your slow smart groups and kill deadlocks for good!\n\n Q:\n\nHow do I convert MySQL enums to Rust enums when using the mysql crate?\n\nI want to read enums from a MySQL table, but I'm failing to convert the string enums from the table into real Rust enums. \nWhat options do I have? The documentation tells me that I should implement the FromValue trait:\nCargo.toml\n[dependencies]\nmysql = \"15.1.0\"\nstrum = \"0.14.0\"\nstrum_macros = \"0.14.0\"\n\nuse mysql::{\n prelude::{ConvIr, FromValue},\n Value,\n};"} +{"output_text": "NA, NA, NA, NA, NA, NA, NA))\n\ntib %>% \n mutate(txt = str_c(txt, collapse = \" \")) %>% \n group_by(id) %>% \n mutate(txt = str_c(txt, collapse = \" \")) %>% \n group_by(id) %>% \n mutate(txt = str_c(txt, collapse = \" \")) %>% \n group_by(", "input_text": " name = 'TestQueue') \n AND NOT EXISTS(\n SELECT \n 1 \n FROM \n myDB.dbo.WorkQueueItemTag WQIT \n WHERE \n WQIT.queueitemident = WQI.ident \n AND EXISTS(SELECT 1 FROM myDB.dbo.Tag WHERE id = WQIT.tagid AND tag LIKE '%Forwarded%') \n ) \nGROUP BY \n WQI.exceptionreason; \n\n Q:\n\nR: conditionally combine text from adjacent rows while retaining associated information\n\nThe script needs to:\na) combine text in adjacent rows, the number of adjacent rows may vary, the grouping of rows to be combined is determined by the first row preceded by a NA and the last row followed by NA, \nb) retain row ids for future checking\nc) retain a numeric variable associated with one row from each of the adjacent rows to be combined \nd) retain the overall order \n\nI have achieved this using a for loop and a load of data wrangling with dplyr and stringer.\nThe for loop is inelegant as I'm struggling with the logic to identify adjacent rows sequentially. This is not important as the grouping variable is just a helper - but it galls me.\nI also wonder if there might be a more efficient way to do this altogether maybe using rowwise and mutate with lead or lag.\nAny guidance or pointers would be appreciated.\nlibrary(tidyverse)\n\ntib <- tibble(id = 1:11,\n var = c(\"a\", NA, NA, \"b\", \"c\", NA, \"d\", NA, NA, NA, \"e\"),\n txt = c( NA, \"the\", \"cat\", NA, "} +{"output_text": "64\\msvcp_win.dll', Symbols loaded (source information stripped).\n'Game.exe': Loaded 'C:\\Windows\\SysWOW64\\msvcr100.dll', Symbols loaded (source information stripped).\n'Game.exe': Loaded 'C:\\Windows\\SysWOW64\\imm32.dll', Symbols loaded (source information stripped).\n'Game.exe': Loaded 'C:\\Windows\\SysWOW64\\", "input_text": "source information stripped).\n'Game.exe': Loaded 'C:\\Windows\\SysWOW64\\sechost.dll', Symbols loaded (source information stripped).\n'Game.exe': Loaded 'C:\\Windows\\SysWOW64\\winmmbase.dll', Symbols loaded (source information stripped).\n'Game.exe': Loaded 'C:\\Windows\\WinSxS\\x86_microsoft.windows.common-controls_6595b64144ccf1df_5.82.9600.17810_none_7c5b6194aa0716f1\\comctl32.dll', Symbols loaded (source information stripped).\n'Game.exe': Loaded 'C:\\Windows\\SysWOW64\\glu32.dll', Symbols loaded (source information stripped).\n'Game.exe': Loaded 'C:\\Windows\\SysWOW64\\ddraw.dll', Symbols loaded (source information stripped).\n'Game.exe': Loaded 'C:\\Windows\\SysWOW64\\sspicli.dll', Symbols loaded (source information stripped).\n'Game.exe': Loaded 'C:\\Windows\\SysWOW64\\cfgmgr32.dll', Symbols loaded (source information stripped).\n'Game.exe': Loaded 'C:\\Windows\\SysWOW64\\devobj.dll', Symbols loaded (source information stripped).\n'Game.exe': Loaded 'C:\\Windows\\SysWOW64\\dciman32.dll', Symbols loaded (source information stripped).\n'Game.exe': Loaded 'C:\\Windows\\SysWOW64\\cryptbase.dll', Symbols loaded (source information stripped).\n'Game.exe': Loaded 'C:\\Windows\\SysWOW"} +{"output_text": "}\n\n $scope.ok = function(){\n console.log('this makes a POST call when OK is clicked');\n }\n\n $scope.cancel = function(){\n console.log('this makes a POST call when Cancel is clicked');\n }\n\n};\n\nA:\n\nYou can use $scope.$on to listen for events on the scope.\n$scope.$on('$destroy', function() {\n $scope.selected", "input_text": "4 id=\"title-color\" class=\"modal-title\"> Please Confirm
' + message + '
';\n modalHtml += '
';\n\n var modalInstance = $modal.open({\n template: modalHtml,\n controller: ModalInstanceCtrl\n });\n\n modalInstance.result.then(function() {\n scope.ngConfirmClick({item:scope.item}); \n }, function() {\n });\n });\n }\n\n }\n }\n }\n]);\n\nHTML\n
\n This is the list I'm looping through:

{{peoples}}

\n\n When I cancel I want to return to the value Sam or whatever value was selected at the time of cancel.\n\n \n\n
\n\ncontroller:\nangular.module('plunker', ['ui.bootstrap']);\nvar ModalDemoCtrl = function ($scope, $modal, $log) {\n\n $scope.selectedPerson = 'Sam';\n $scope.peoples = ['Sam', 'Randy', 'Joe'] ; \n\n $scope.sendPost = function(){\n console.log('this makes a POST call normally');\n "} +{"output_text": ":conditionalFormatting xmlns:xm=\"\"http://schemas.microsoft.com/office/excel/2006/main\"\">\n \n \n ", "input_text": "stWs = xdoc.CreateNode(XmlNodeType.Element, \"extLst\", xdoc.DocumentElement.NamespaceURI);\n extLstWs.InnerXml = @\"\n \n \n \n \n \n 0\n \n \n 100\n \n \n \n \n A1:A20\n \n predictions;\n\n Predictions({this.predictions});\n\n factory Predictions.fromJson(Map json){\n return Predictions(\n predictions: json['predictions'] as List\n );\n }\n}\n\nclass Prediction {\n final String prediction;\n final String predictionICL;\n final String predictionRand", "input_text": "ICL;\n final LibraryPrediction predictionRandwyck;\n final LibraryPrediction predictionTapijn;\n final String message;\n\n PredictionsUpdate({\n this.predictionICL,\n this.predictionRandwyck,\n this.predictionTapijn,\n this.message,\n});\n\n factory PredictionsUpdate.fromJson(Map parsedJson){\n return PredictionsUpdate(\n predictionICL: LibraryPrediction.fromJson(parsedJson['predictionICL']),\n predictionRandwyck: LibraryPrediction.fromJson(parsedJson['predictionRandwyck']),\n predictionTapijn: LibraryPrediction.fromJson(parsedJson['predictionTapijn']),\n message: parsedJson['message'] as String,\n );\n }\n}\n\nclass LibraryPrediction {\n final String openingTimeToday;\n final String openingTimeTomorrow;\n final List percentagesToday;\n final List percentagesTomorrow;\n\n LibraryPrediction({\n this.openingTimeToday,\n this.openingTimeTomorrow,\n this.percentagesToday,\n this.percentagesTomorrow,\n });\n\n factory LibraryPrediction.fromJson(Map json){\n return LibraryPrediction(\n openingTimeToday: json['openingTimeToday'] as String,\n openingTimeTomorrow: json['openingTimeTomorrow'] as String,\n percentagesToday: json['percentagesToday'] as List,\n percentagesTomorrow: json['percentagesTomorrow'] as List,\n );\n }\n"} +{"output_text": "9c0)', 'mask_data', 'HASH(0xccf040)', 'tag_data', 'HASH(0xccf040)') called at /nfs/pdx/disks/nehalem.pde.077/projects/lib/Program-Plist-Pl/lib/Program/Plist/Pl.pm line 61\n\nProgram::Plist::Pl::_create_pattern_obj(undef, 'name',", "input_text": "lib64/site_perl/MooseX/Types/TypeDecorator.pm line 88\n\nMooseX::Types::TypeDecorator::new('MooseX::Types::TypeDecorator=HASH(0x1620c58)', 'name', 'g1145114N5582201_16161616a2x_FU02xxT_2bxc2e3_6x0xxxp0fx0xxx0x...','mask_data', '', 'tags', 0) called at /nfs/pdx/disks/nehalem.pde.077/projects/lib/Program-Plist-Pl/lib/Program/Plist/Pl.pm line 61\n\nProgram::Plist::Pl::_create_pattern_obj(undef, 'name', 'g1145114N5582201_16161616a2x_FU02xxT_2bxc2e3_6x0xxxp0fx0xxx0x...','mask_data', '', 'tag_data', '') called at /nfs/pdx/disks/nehalem.pde.077/projects/lib/Program-Plist-Pl/lib/Program/Plist/Pl.pm line 77\n\nProgram::Plist::Pl::BUILD('Program::Plist::Pl=HASH(0x162d6c0)', 'HASH(0x162d648)') called at generated method (unknown origin) line 101\n\nProgram::Plist::Pl::new('Program::Plist::Pl', 'name', 'bist_hfmmin_16161616_list', 'parents', 'HASH(0xccf040)', 'fh', 'GLOB(0xccc"} +{"output_text": " a custom window in WPF. I have a class called WindowBase which inherits from Window. I have a class called Window which inherits from WindowBase. I have a class called MainWindow which inherits from Window. I have a class called MainWindow which inherits from WindowBase.\nI have a MainWindow.xaml file which contains the following:\n MOISTURE_THRESHOLD) {\n pulseCount = 0;\n attachInterrupt(SENSOR_INTERRUPT, pulseCounter, FALLING);\n digitalWrite(RELAY_PIN, LOW); // open valve\n watering = true;\n }\n }\n\n // Stop watering after the required volume.\n if (watering) {\n if (getPulseCount() >= wateringPulseCount) {\n digitalWrite(RELAY_PIN, HIGH); // close valve\n detachInterrupt(SENSOR_INTERRUPT);\n watering = false;\n }\n }\n}\n\nNote that the pulse count has to be read with interrupts disabled, but\nyou should not detach the interrupt handler. Detaching the handler\nmeans you will miss pulses, wheres disabling the interrupts only delays\ntheir handling until you enable them again. You won't miss pulses unless\nto keep the interrupts disabled for too long (which you shouldn't do).\n\n Q:\n\nWPF: Inheriting from Window\n\nI seem to be having some trouble creating"} +{"output_text": "\n }\n return null;\n}\n\nBut I still get the same error.\nI also tried to get the name from the Uri like this:\npublic String getNameKitkat(Uri contentUri) {\n Cursor cursor = getContentResolver()\n .query(contentUri, null, null, null, null, null); // <--EXCEPTION\n try {\n if (cursor!= null && cursor.moveToFirst()) {\n ", "input_text": "\n if (cursor!= null) {\n cursor.close();\n }\n }\n return null;\n}\n\nWhere I received following error:\n09-23 16:54:17.664 32331-32341/? E/DatabaseUtils\ufe55 Writing exception to parcel\njava.lang.UnsupportedOperationException: Unsupported column: _data\n at com.google.android.apps.docs.quickoffice.FileContentProvider.query(FileContentProvider.java:78)\n at android.content.ContentProvider.query(ContentProvider.java:857)\n at android.content.ContentProvider$Transport.query(ContentProvider.java:200)\n at android.content.ContentProviderNative.onTransact(ContentProviderNative.java:112)\n at android.os.Binder.execTransact(Binder.java:404)\n at dalvik.system.NativeStart.run(Native Method)\n\nAfter some research I figured out that Android 4.4 Kitkat introduces SAF (Storage Access Framework) which manages data different, so I tried the following Code from https://developer.android.com/guide/topics/providers/document-provider.html to get the name:\npublic String getNameKitkat(Uri contentUri) {\nCursor cursor = getContentResolver()\n .query(contentUri, null, null, null, null, null); // <--EXCEPTION\n try {\n if (cursor!= null && cursor.moveToFirst()) {\n String displayName = cursor.getString(\n cursor.getColumnIndex(OpenableColumns.DISPLAY_NAME));\n\n return displayName;\n }\n } finally {\n cursor.close();"} +{"output_text": "bleshooting mode is a legitimate state.\n\nA:\n\nI have been in contact with Apple support for the past few days. I have been told that my account is in troubleshooting mode and that I need to provide a password to get it out of it. I have been told that I can't provide my password to them.\n\nThis is not true.\n\nI have been told that I can't provide my password to them.\n\nThis is not true", "input_text": ", I asked if I could simply hand them my password (it was already a temporary password, but due to a screwup on their part, my account was out of Troubleshooting mode). The Senior Advisor declined, citing policy that they must provide a randomly-generated password and could not accept a password from a customer. This is a very important point, because I get the impression from the reactions/answers that people think the support technician is asking @Oleg for his password. That is NOT the case.\nI feel I should also add that yes, I am 100% certain I was talking to Apple employees the entire time. I contacted them through the Apple Support site, they called me back from the same Apple number every time, which I have saved in my Contacts, and every technician I was in touch with emailed me from an @apple.com address, to which I was able to send emails and get responses from (so that takes care of spoofed headers). They're able to Screen Share just by knowing your Apple ID but not your IP address, then ask you to upload diagnostics data to green header address ending in apple.com. It would take a very high degree of sophistication to pull off a scam of this magnitude (not to mention, if all they cared about was accessing your iCloud account, they could just stop once they got your password instead of spending hours upon hours going through troubleshooting steps that don't get them any additional data about you).\nAnd obviously, when you get a response from the Corporate Executive Relations Office after emailing Tim Cook directly, you're pretty confident it's an Apple employee talking to you (the response includes your original email). If that person acknowledges that your account is in Troubleshooting mode and understands that you would like to get it out of it, then you're also pretty confident that Trou"} +{"output_text": "\n child: FloatingActionButton.extended(\n onPressed: () {},\n backgroundColor: Colors.transparent,\n child: Icon(\n Icons.arrow_back,\n size: 24,\n color: Colors.black,\n ),\n ),\n),\n\n Q:\n\nHow to get the current date in a specific format in Java?\n\nI have a date in the format of \"yyyy-MM", "input_text": "e\n shape: CircleBorder(\n side: BorderSide(color: Colors.black, width: 5),\n ),\n child: Icon(\n FontAwesomeIcons.chevronCircleUp,\n color: Colors.white.withOpacity(.8),\n size: 80,\n )),\n ),\n\nThanks for the help\n\nA:\n\nWrap your InkWell widget with Material widget and material widget should have a color.\n Material( color: Colors.transparent,\n child: InkWell( onTap: () { print(\"tapped\"); }, ) );\n\nA:\n\nI see that you'd like granular controll over the size of the ripple.\nI ended up using the code below.\nPadding(\n padding: EdgeInsets.all(8.0),\n child: InkWell(\n customBorder: new CircleBorder(),\n onTap: () {},\n splashColor: Colors.red,\n child: new Icon(\n Icons.arrow_back,\n size: 24,\n color: Colors.black,\n ),\n ),\n)\n\nThe InkWell effect renders a square, however using CircleBorder crops it to a circular shape.\nBy default the effect attempts to fill the space, so to modify the size I added padding on all sides, cropping the effect. If you are still having trouble with the ripple effect not rendering at all, wrapping your code in a Material() should fix most issues, or taking a look at the app theme.\n\nA:\n\nI used a FAB without background.\nFloatingActionButton(\n onPressed: () {},"} +{"output_text": "(char)\n key = key % 256\n key = key / 256\n key = key + 1\n key = key % 256\n key = key / 256\n key = key + 1\n key = key % 256\n key = key / 256\n key = key + 1\n key = key % 256\n key = key / 256\n key = key + 1\n key = key % 256\n key = key /", "input_text": " = 0; i < n; i++) {\n list2[i] = input.nextDouble();\n\n if (i == (n - 1)) {\n System.out.println(\"\\t The average of the 5 doubles is \"\n + average(list2, n));\n }\n\n }\n }\n\n public static int average(int[] array, int n) {\n int sum = 0;\n for (int i = 0; i < n; i++) {\n int holdNumber = array[i];\n sum += holdNumber;\n }\n\n int average = sum / n;\n return average;\n\n }\n\n public static double average(double[] array, int n) {\n double sum = 0;\n for (int i = 0; i < n; i++) {\n double holdNumber = array[i];\n sum += holdNumber;\n }\n double average = sum / n;\n return average;\n }\n}\n\n Q:\n\nAre these C# and Python code snippets functionally identical?\n\nI have a bit of Python source code that I want to recreate in C#. The code is about reading and decrypting a binary file. I have tested the function on an existing file and it runs without errors; while the resulting string is not garbled or anything, it does not appear to be useful to me.\nBut that is outside the scope of this question. I only want to know if I have translated the function correctly to C# so it does the same as in Python.\nThe Python code:\n filename = os.path.basename(path_and_filename)\n key = 0\n for char in filename:\n key = key + ord"} +{"output_text": "local1=\"clr-namespace:MTCQuest.CustomControls\"\n mc:Ignorable=\"d\" \n d:DesignHeight=\"300\" d:DesignWidth=\"300\">\n\n \n\n \n \n \n \r\n\r\n\r\n
Section 1
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
Section 2
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
Section 3
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
\r\n
"} +{"output_text": "\": 16.30,\n \"endTime\": 17.00,\n \"value1\": \"2016-01-20\",\n \"value2\": \"2016-01-21\",\n \"color\": \"#9DC481\",\n\n }, {\n \"name\": \"105\",\n \"startTime\": 17.00,\n \"endTime\": 18.00,\n \"value1\": \"2016-01-20\",\n \"value", "input_text": "light\",\n \"type\": \"serial\",\n \"dataDateFormat\": \"YYYY-MM-DD\",\n \"dataProvider\": [ {\n \"name\": \"111\",\n \"startTime\": 7.00,\n \"endTime\": 17.0,\n \"value1\": \"2016-05-03\",\n \"value2\": \"2016-05-05\",\n \"color\": \"#FF3F33\",\n\n }, {\n \"name\": \"220\",\n \"startTime\": 18.17,\n \"endTime\": 16.18,\n \"value1\": \"2016-03-15\",\n \"value2\": \"2016-03-20\",\n \"color\": \"#FFB134\",\n\n }, {\n \"name\": \"109\",\n \"startTime\": 17.0,\n \"endTime\": 18.40,\n \"value1\": \"2016-01-20\",\n \"value2\": \"2016-01-22\",\n \"color\": \"#F9FF34\",\n\n }, {\n \"name\": \"108\",\n \"startTime\": 19,\n \"endTime\": 21,\n \"value1\": \"2016-01-20\",\n \"value2\": \"2016-01-21\",\n \"color\": \"#36DB44\",\n\n }, {\n \"name\": \"107\",\n \"startTime\": 14.50,\n \"endTime\": 16.30,\n \"value1\": \"2016-01-20\",\n \"value2\": \"2016-01-22\",\n \"color\": \"#9DC481\",\n\n }, {\n \"name\": \"106\",\n \"startTime"} +{"output_text": "Stack &val);\n explicit JSON(const JSON &val);\n explicit JSON(const JSON &val, Type type);\n explicit JSON(const JSON &val, Type type, const UnicodeString &key);\n explicit JSON(const JSON &val, Type type, const UnicodeString &key, const UnicodeString &val);\n explicit JSON(const JSON &val, Type type, const UnicodeString &key, const UnicodeString &val", "input_text": " are using one of the \"classic\" pre-C++11 compilers, you will have to adjust this code accordingly):\n//---------------------------------------------------------------------------\n#ifndef JSONH\n#define JSONH\n\n#include \n#include \n\n#include \n#include \n#include \n#include \n#include \n\n#include \n#include \n#include \n#include \n#include \n\nclass JSON\n{\n public: \n\n // JSON types\n enum Type\n {\n Integer,\n Boolean,\n Float,\n String,\n Object,\n Array,\n Null\n };\n\n using UniquePtr = std::unique_ptr;\n using SharedPtr = std::shared_ptr;\n using Pair = std::pair;\n using objectStack = std::stack;\n using arrayStack = std::stack;\n\n // Static functions\n static UniquePtr JSON::parse(const UnicodeString &str);\n static UniquePtr JSON::parser(TJsonTextReader& json_reader);\n\n JSON();\n explicit JSON(int val);\n explicit JSON(bool val);\n explicit JSON(float val);\n explicit JSON(const UnicodeString &val);\n explicit JSON(objectStack &val);\n explicit JSON(array"} +{"output_text": " Group\n Are you sure to assign the group?\n \n \n \n \n\nIn vendor/magento/module-customer/view/frontend/layout/customer_index_index.xml added some params:\n \n", "input_text": "\" xsi:type=\"string\">status\n \n Confirm\n Are you sure to change the status?\n \n \n \n \n\nAfter update i get the error message:\nException #0 (InvalidArgumentException): Class argument is invalid: Company\\Module\\Ui\\Component\\MassAction\\ChangeStatus\nThe code above was the only i found. Is there another right way to do this? Or is there an error in my code?\nRegards,\nAndreas\n\nA:\n\nYou should change your code using example from any magento 2.3 listing, for example you can use the customer group change action in the customer grid:\nSee source model here:\nvendor/magento/module-customer/Ui/Component/MassAction/Group/Options.php\nIn vendor/magento/module-customer/etc/di.xml added some params:\n \n \n \n customer/index/massAssignGroup\n group\n \n Assign"} +{"output_text": ".\nI have a plugin that uses clang bindings. I have a function that calls clang_getDiagnosticClient() and then calls clang_getDiagnosticClient() again. This is a function that is called from a plugin.\nThe problem is that sometimes the second call to clang_getDiagnosticClient() fails with libclang: crash detected during reparsing. This then leads to crashing plugin host.\nI have a small example that reproduces the problem", "input_text": "\nUna de las cosas es que en el JSON que esta dentro de una lista (en donde me funcionaba), ah\u00ed pregunto si en la posici\u00f3n 0 de esa lista existe un atributo username y si coincide lo traiga, pero esta vez el match no se efectua y entonces no se establece conexi\u00f3n y mi intento de consumir datos no funciona.\nCualquier ayuda ser\u00e1 apreciada.\n\nA:\n\nTe sobra un acceso al JSON:\nstring jsonName = (string)itemData[0][\"username\"];\n\nEsa l\u00ednea te est\u00e1 lanzando una Exception que se recoge en el try-catch y tu registro dice Failed Connection. El \u00edndice 0 de itemData es username, y dentro de username no hay nada, pero t\u00fa buscas un elemento con la propiedad username y salta el Exception.\nLo correcto es: \nstring jsonName = (string)itemData[\"username\"];\n\n... y as\u00ed con todas las dem\u00e1s.\n\n Q:\n\nsublime text 3 plugin host crash recovery\n\nI develop a plugin for Sublime Text 3 and my python code uses c type bindings to clang. Sometimes calling libclang would segfault with libclang: crash detected during reparsing (I don't understand the reason yet, but it is irrelevant to this question). This then leads to crashing plugin host.\nSo the question is: is there any way in python to recover from a failure in the underlying c binding? I would gladly just skip this action on this particular file where I experience the crash.\nThanks! \nUPD: There was a short discussion in comments and it makes sense to elaborate further on the lack of a proper small reproducible example"} +{"output_text": " $submitted_data[ 'user_phone' ],\n 'user_email' => $submitted_data[ 'user_email' ],\n 'user_website' => $submitted_data[ 'user_website' ],\n 'user_notes' => $submitted_data[ 'user_notes' ],\n 'user_date_added' => $submitted_data[ 'user_date_added' ],\n 'user", "input_text": " = '$order_id'\" );\n\n if ( $result!= NULL )\n add_filter( 'ninja_forms_render_default_value', 'custom_plugin_autofill_form', 10, 3 );\n}\n\ndo_action( 'check_for_previous_order' );\n\nadd_action( 'ninja_forms_after_submission', 'custom_plugin_save_db' );\nfunction custom_plugin_save_db( $form_data ) {\n global $wpdb;\n $table_name = $wpdb->prefix. \"my_custom_plugin_table\";\n $submitted_data = [];\n\n foreach ( $form_data[ 'fields' ] as $field ) {\n $key = $field[ 'key' ];\n $value = $field[ 'value' ];\n $submitted_data[ $key ] = $value;\n }\n\n $wpdb->replace( $table_name, array(\n 'order_id' => $submitted_data[ 'order_project_id' ]. '-'. get_current_user_id(),\n 'user_company' => $submitted_data[ 'user_company' ],\n 'user_name' => $submitted_data[ 'user_name' ],\n 'user_address' => $submitted_data[ 'user_address' ],\n 'user_city' => $submitted_data[ 'user_city' ],\n 'user_state' => $submitted_data[ 'user_state' ],\n 'user_zip' => $submitted_data[ 'user_zip' ],\n 'user_phone' =>"} +{"output_text": "aven\n\nI have a plugin that is used to generate a report. The plugin is used in a multi-module project. The plugin is used in the parent pom.xml. The plugin is used in the child pom.xml.\nThe plugin is used in the parent pom.xml as follows:\n\n com.mycompany.myplugin\n myplugin\n 1.0-", "input_text": "\n973 7.563686 0.552758\n974 7.564105 0.540144\n975 7.564428 0.541411\n976 7.564351 0.532154\n977 7.564408 0.530999\n978 7.564681 0.532376\n979 7.565192 0.536758\n980 7.565359 0.538629\n981 7.566112 0.555789\n982 7.566678 0.553163\n983 7.566364 0.577953\n984 7.567587 0.634843\n985 7.568583 0.679807\n986 7.569268 0.662653\n987 7.570018 0.630447\n988 7.570375 0.659497\n989 7.570704 0.622190\n990 7.571009 0.485458\n991 7.571886 0.551147\n992 7.573148 0.459912\n993 7.574134 0.463146\n994 7.574478 0.463158\n995 7.574671 0.535014\n996 7.575177 0.467705\n997 7.575374 0.531098\n998 7.575620 0.540611\n999 7.576727 0.465572\n\n[1000 rows x 2 columns]\n\n Q:\n\nAccessing a project resource from a plugin in M"} +{"output_text": "+ + + +---+ +---+\n| V | q q | g | e e e e | s s s | r r r | p p p |\n+ + +---+---+ + + +---+ + + +---+ +---+ +---+\n| V | q q | g | e e e e", "input_text": " | d | c | n n | C | v | N N N | R R |\n+---+ + + + + +---+---+---+ +---+---+---+---+ + +\n| M M | i i | h | d d d | C C C | u u | R R R |\n+ +---+---+ +---+ + +---+---+---+---+---+---+---+ + +\n| M | S S | i | e | d d | E E | s s s | O O | R R |\n+ +---+ + + +---+---+ +---+ + +---+---+ +---+ +\n| M | q | S | i | e e e | E | s s s | t | r | O | p | R |\n+ + +---+---+ + + +---+ + + +---+ +---+ +---+\n| M | q q | g | e e e e | s s s | r r | p p p |\n+---+ +---+---+---+---+ + +---+ + + + +---+ +---+\n| V | q | U U | f f | e e | o | s s | r r r | p | Q |\n+ +---+ + +---+ +---+---+ +---"} +{"output_text": " can make the spotlight a tool to help the PCs grow and develop, rather than a tool to make them shine.\n\nA:\n\nI think the best way to do this is to have the spotlight be a tool to help the PCs grow and develop, rather than a tool to make them shine.\n\nThe spotlight is a tool to help the PCs grow and develop, rather than a tool to make them shine.\n\nThe spotlight is a tool to help the PCs grow and develop", "input_text": " attention\u2026\n\nRemember, GM moves aren't always bad! \nYou could separate them so Cleric has some alone time in the spotlight:\n\nYou tumble down the sudden and slippery tunnel slope. You land with a bump\u2026 alone. Some trick of the stalagmites must have channeled you each down a different branch of the downward tunnel. Cleric, you see [stuff], what do you do?\n\nYou can show signs of an approaching threat, but that only Cleric notices (or maybe that only Cleric can notice):\n\nCleric, while the ogre and Fighter are occupied with each other, you notice a dark cloud moving toward you all, coming over the trees. A moment later and you realise it's a dense swarm of flying insects. It'll probably be here within a minute. What do you do?\n\nor\n\nThere's a presence hanging over the angry mob. You can all feel something, but Cleric, you can suddenly see it. It's a shining \u2014 angel, looks like? Huh. What do you do?\n\nYou can reveal an unwelcome truth to the Cleric about something she'd be more readily perceptive about than any other PCs, or perhaps just more perceptive about currently:\n\nCleric, the Mayor's speech is suddenly touching on religious themes, but in a way that gives you a sinking feeling that she's about to renege on the deal and hang you all out to dry. And in front of the King, no less. What do you do?\n\nAny time someone rolls a miss, or the group looks to you to find out what happens, or you're otherwise given a Golden Opportunity, you can pick a move and use it to throw the spotlight on one PC exclusively, to see them shine and learn more about who they are. In this way, you"} +{"output_text": "fopen(\"DNAS.txt\",\"r\");\n if(arquivo==NULL)\n {\n printf(\"O arquivo nao pode ser lido\\n\");\n system(\"pause\");\n return 0;\n }\n int i=0;\n while(fgets(linhas[i],11,arquivo)!=NULL)\n {\n printf(\"\\n\");\n fgetc(arquivo", "input_text": ":%i%% G:%i%%\\n\\n\",(soma_AT*100)/20,(soma_AT*100)/20,(soma_CG*100)/20,(soma_CG*100)/20);\n habilidades(soma_CG,soma_AT);\n calcular_peso(ler_linha,&qtd_G,&qtd_C);\n return 0;\n}\n\nint main()\n{\n char bases[11],base_2[11];\n FILE *arquivo;\n arquivo = fopen(\"DNAS.txt\",\"r\");\n\n if(arquivo==NULL)\n {\n printf(\"O arquivo nao pode ser lido\\n\");\n system(\"pause\");\n return 0;\n }\n\n while(fgets(bases,11,arquivo)!=NULL)\n {\n printf(\"\\n\");\n fgetc(arquivo);\n printf(\"\\t%s\\n\",bases);\n cria_complementar(bases,base_2);\n system(\"pause\");\n system(\"cls\");\n }\n\n fclose(arquivo);\n\n return 0;\n}\n\nA leitura do arquivo \u00e9 na seguinte formata\u00e7\u00e3o:\nCGATGCATGC\n\nV\u00e1rias linhas usando apenas ATCG e a impress\u00e3o no arquivo \u00e9 a mesma linha seguido de um n\u00famero.\n\nA:\n\nPrimeira coisa que eu reparei\nint top_IDS(int ID_DNA, char *linhas[])\n{\n FILE *arquivo;\n arquivo="} +{"output_text": ". . . / / \\ \\ \\:::..::.|\n / . . . / / \\ \\ \\:::..::.|\n/ . . . / / \\ \\ \\:::..::.|\n/ . . . / / \\ \\ \\:::..::.|\n/ . . . / ", "input_text": "li> \n
  • List item
  • \n \n \n
    \n \n \n \n \n\n \n Please install the Shockwave plugin to watch this movie.\n \n \n
    \n\n                                       _,'/\n                                  _.-''._:\n                         ,-:`-.-'   .:.|\n                         ;-.''      .::.|\n          _..------.._  / (:.      .:::.|\n      ,'.  .... .`/  : :.    .::::.|\n    ,'..   . .  ./    \\ ::..::::::.|\n  ,'"}
    +{"output_text": " data from the server.\n$.ajax({\n    url: 'http://localhost:3000/api/forms/',\n    type: 'GET',\n    dataType: 'json',\n    data: {\n        id: id\n    },\n    success: function(data) {\n        var form = JSON.parse(data);\n        var template = $('#template').html();\n        var partials = $('#partials').html();\n        var formData", "input_text": "\n        }\n\n Q:\n\nUse mustache for templating smaller parts for use multiple times in a form\n\nOkay. So I hope this is specific enough.\nI'm, kind of new to mustache, but see it has great potential, so why not use it.\nI'm making a quite big form, and want to have the form built with mustache. So i have started to make the form in mustache and, then i realized i want to template the form-elements. One template for how i want every narrow input, wide input, select etc. to look like. because now i'm repeating myself. \nMy template and partials are provided through $.ajax get, where the main form template are defined as a mustache file, with html content, and the partials are defined as a mustache file with every template inside -tags.\nVariables for mustache to use. This object is somewhat subject for change.\nvar jsonForm = {\n        oneInputField: {\n            value:'put your title here',\n            rule_set: {\n                required: {\n                    strName:'required',\n                    strErrorMsg: 'error message'\n                }\n            }\n        },\n        oneSelect: {\n            options: [\n                {value: '- Pick one -', helper: 'helper', select_options: {disable_search: true}},\n                {value: 'option1', selected: true},\n                {value: 'option2'},\n                {value: 'option3'},\n                {value: 'option4'}\n            ],\n            rule_set: {\n                required: {\n                    strName:'required',\n                    strErrorMsg: 'error message'\n                }\n            }\n        }\n    };\n\nHow i fetch the"}
    +{"output_text": "\nA:\n\nI think you're looking for the Task.FromResult method.\n\nCreates a task that represents the completion of an asynchronous operation.\n\nIt's a bit of a mouthful, but it's basically a wrapper around the Task.Factory.StartNew method.\n\nA:\n\nI think you're looking for the Task.FromResult method.\n\nCreates a task that represents the completion of an asynchronous operation.", "input_text": ", there will be no retries.\nI'm looking for a Task compatible equivalent of Lazy(Func, LazyThreadSafetyMode.PublicationOnly), i.e. it should behave as that is specified:-\n\nAlternative to locking  In certain situations, you might want to avoid the overhead of the Lazy object's default locking behavior. In rare situations, there might be a potential for deadlocks. In such cases, you can use the Lazy(LazyThreadSafetyMode) or Lazy(Func,\u2002LazyThreadSafetyMode) constructor, and specify LazyThreadSafetyMode.PublicationOnly. This enables the Lazy object to create a copy of the lazily initialized object on each of several threads if the threads call the Value property simultaneously. The Lazy object ensures that all threads use the same instance of the lazily initialized object and discards the instances that are not used. Thus, the cost of reducing the locking overhead is that your program might sometimes create and discard extra copies of an expensive object. In most cases, this is unlikely. The examples for the Lazy(LazyThreadSafetyMode) and Lazy(Func,\u2002LazyThreadSafetyMode) constructors demonstrate this behavior.\nIMPORTANT\nWhen you specify PublicationOnly, exceptions are never cached, even if you specify a factory method.\n\nIs there any FCL, Nito.AsyncEx or similar construct that might fit in nicely here? Failing this, can anyone see an elegant way to gate the \"attempt in progress\" bit (I'm OK with each caller making its own attempt in the same way that a Lazy(..., (LazyThreadSafetyMode.PublicationOnly) does) and yet still have that and the cache management encapsulated neatly?\n"}
    +{"output_text": "ailingErrors>\n                        true\n                        true\n                        true\n                        true\n                        true\n                        true\n                        true\n                        true\n", "input_text": " => Tasks and Events\n                            [type] => Activity\n                        )\n\n                )\n\n        )\n\n)\n\nAs you can see in the results only the CALLTYPE column filter applied (this is the default filter).\nDoes anyone see anything wrong with the POST data that I am sending?\nI have tried using other reports and other date columns to filter on, but it is always the same, only the default filters apply.\n\nA:\n\nI found the answer. I was not setting the header Content-type to application/json. Once I did that the filters started working.\n\n Q:\n\npmd maven plugin does not respect exclude-cpd.properties\n\nI have maven project with multiple modules, the structure is like this:\n/myProject \n  pom.xml\n  exclude-cpd.properties\n  -- module1\n     pom.xml\n     exclude-cpd.properties\n  -- module2\n     pom.xml\n     exclude-cpd.properties\n\nin my parent pom.xml i added pmd plugin:\n\n        \n            \n                \n                    org.apache.maven.plugins\n                    maven-pmd-plugin\n                    3.12.0\n                    \n                        \n                            pmd_ruleset.xml\n                        \n                        80\n                        true\n                        exclude-cpd.properties\n                        true\n    
    \n
    \n
    \n <", "input_text": " preserve-3d;\r\n -webkit-transform-origin: center;\r\n -moz-transform-origin: center;\r\n -o-transform-origin: center;\r\n transform-origin: center;\r\n animation: removeAnim 0.5s ease,\r\n otherAnimation 0.5s ease;\r\n}\r\n@keyframes otherAnimation {\r\n 0% {\r\n transform: scale(1.0);\r\n }\r\n 25% {\r\n transform: scale(0.75);\r\n }\r\n 50% {\r\n transform: scale(0.50);\r\n }\r\n 75% {\r\n transform: scale(0.25);\r\n }\r\n 100%{\r\n transform: scale(0);\r\n }\r\n}\n\r\n
    \r\n

    \r\n
    \r\n
    \r\n
    \r\n

    Test

    \r\n
    \r\n
    \r\n

    MyBack

    \r\n
    \r\n
    \r\n
    \r\n \r\n
    \r\n
    \n\n Q:\n\nJQuery Datepicker in Bootstrap modal\n\nI'm working with Javascript only for a couple of months, and am stuck with this issue. I have a page with 2 modals, which contains a JQuery-UI-Datepicker. First modal creates an entry with start date, and the second modal allows"} +{"output_text": "add(new Chunk(\"\\n\\nWard No. 1- 9957047867 Ward No. 17- 9864623744\", FontFactory.getFont(FontFactory.TIMES_ROMAN, 6)));\n ph6.add(new Chunk(\"\\n\\nWard No. 1- 9957047867 Ward No. 17- 9864623744\", FontFactory.getFont(FontFactory.TIMES_ROMAN", "input_text": ")));\n assementIdOldCell.setHorizontalAlignment(Element.ALIGN_LEFT);\n detailsTable.addCell(assementIdOldCell);\n\n PdfPCell assementIdNewValueCell=new PdfPCell(new Phrase(\"225642 \", FontFactory.getFont(FontFactory.TIMES_ROMAN, 8, Font.NORMAL)));\n assementIdNewValueCell.setHorizontalAlignment(Element.ALIGN_RIGHT);\n detailsTable.addCell(assementIdNewValueCell);\n\n details.setBorderWidthTop(0);\n details.setBorderWidthBottom(0);\n details.addElement(detailsTable);\n maintable.addCell(details);\n\n Paragraph ph6=new Paragraph();\n ph6.add(new Chunk(\"HOUSEHOLD WASTE COLLECTION \", FontFactory.getFont(FontFactory.TIMES_ROMAN, 8, Font.BOLDITALIC,Color.RED)));\n ph6.add(new Chunk(\"\\n\\nWard-wise contact number of NGOs for collecting household waste from your doorstep:- \", FontFactory.getFont(FontFactory.TIMES_ROMAN, 6)));\n ph6.add(new Chunk(\"\\n\\nWard No. 1- 9957047867 Ward No. 17- 9864623744\", FontFactory.getFont(FontFactory.TIMES_ROMAN, 6)));\n ph6.add(new Chunk(\"\\n\\nWard No. 1- 9957047867 Ward No. 17- 9864623744\", FontFactory.getFont(FontFactory.TIMES_ROMAN, 6)));\n ph6."} +{"output_text": " } \n }\n }\n }\n] )\n\ngives the following result:\n{\n \"_id\" : \"Counts\",\n \"Documents\" : 2,\n \"Downloads\" : 2\n}\n\nThe following is the data I am using; there are 4 documents. When I run the code I get a count of 2 (which is correct and expected as there are two paths which match the pattern \"^C:\\\\Users\\\\", "input_text": " in C:\\Users\\A\\Downloads and how many elements there are in C:\\Users\\A\\Documents. How can I do it?\n\nA:\n\ni want to see how many documents I have in a collection with the same\n base path:\n\nAssuming you are supplying the base path to find the number of documents with that base path - the following regex query will count all the documents with the path field value staring with \"C:\\Users\\A\\Downloads\\\".\ndb.paths.find( { path: /^C:\\\\Users\\\\A\\\\Downloads\\\\/ } ).count()\n\nCode Using MongoDB Java Driver:\nPattern p = Pattern.compile(\"^C:\\\\\\\\Users\\\\\\\\A\\\\\\\\Documents\\\\\\\\\");\nBson queryFilter = regex(\"path\", p);\nlong count = collection.countDocuments(filter);\n\nThe following is the data I am using; there are 4 documents. When I run the code I get a count of 2 (which is correct and expected as there are two paths which match the pattern \"^C:\\\\Users\\\\A\\\\Documents\\\\\").\n\nUsing the same data as shown in the Compass screenshot, the following aggregation\ndb.paths.aggregate( [\n { \n $group : { \n _id : \"Counts\", \n Documents: { \n $sum: {\n $cond: [ { $regexMatch: { input: \"$path\", regex: /^C:\\\\Users\\\\A\\\\Documents\\\\/ } }, 1, 0 ]\n } \n },\n Downloads: { \n $sum: {\n $cond: [ { $regexMatch: { input: \"$path\", regex: /^C:\\\\Users\\\\A\\\\Downloads\\\\/ } }, 1, 0 ]\n"} +{"output_text": ":\nvoid foo(int a) {\n setState(() {\n a++;\n });\n}\n\nThe same applies to your _buildTeamSelector method.\n\nA:\n\nYou are not updating the state variable, you are updating the parameter of the method.\nYou should change your code to:\nvoid _buildTeamSelector(int team1Index, int team2Index) {\n setState(() {\n team1Index = team1", "input_text": " 5,\n child: Column(\n children: [\n Column(\n children: [\n Text('Value of team1Index: :$team1Index'),\n Row(\n mainAxisAlignment: MainAxisAlignment.spaceBetween,\n children: [\n Text('Teams:'),\n _buildTeamSelector(team1Index),\n _buildTeamSelector(team2Index),\n Text('Red'),\n ],\n ),\n ],\n ),\n ],\n ),\n );\n }\n}\n\nI/flutter (12114): Value of teamColorIndex 0\nI/chatty (12114): uid=10085(com.example.game_tracker) Thread-2 identical 1 line\nI/flutter (12114): Value of teamColorIndex 0\nI/flutter (12114): Value of teamColorIndex 0\nI/chatty (12114): uid=10085(com.example.game_tracker) Thread-2 identical 4 lines\nI/flutter (12114): Value of teamColorIndex 0\nI/flutter (12114): Value of teamColorIndex 0\n\nWith in the setState the variable doesn't seem to be getting updated. I know this is the issue but i don't know why.\n\nA:\n\nThe variable you are updating isn't the one you think.\nYou have two variables with the same name:\n\na property of State\na parameter of _buildTeamSelector\n\nWhat you changed inside the setState call isn't the State property, but the parameter of your method.\nInstead of:\nint a;\nvoid foo(int a) {\n a++;\n}\n\ndo"} +{"output_text": "\r\n padding-bottom: 25px;\r\n padding-left: 15px;\r\n padding-right: 15px;\r\n font-size: 16px;\r\n font-weight: bold;\r\n color: #fff;\r\n text-decoration: none;\r\n display: inline-block;\r\n position: relative;\r\n cursor: pointer;\r\n border-radius: 0;\r\n}\r\n.business-post-link:hover {\r\n color: #fff;\r\n background: #", "input_text": "container-->\r\n
    \r\n \r\n
    \n\nA:\n\nMaybe try putting some col-md-2 place holder divs on either side?\nThe first row of three buttons, since all three divs are col-md-4 in a 12 column system, the top row of buttons end up centered between columns 2 & 3, between 6 & 7, and between 10 & 11.\nThe ideal place to center the two buttons for the second row would be between columns 4 & 5 and between columns 8 & 9. But if you do two big col-md-3 columns, they end up getting centered between columns 3 & 4 and between columns 9 & 10.\n\n.content-sm {\r\n padding-top: 60px;\r\n padding-bottom: 60px;\r\n}\r\n@media (min-width: 1200px)\r\n.container {\r\n width: 1170px;\r\n}\r\n@media (min-width: 992px)\r\n.container {\r\n width: 970px;\r\n}\r\n@media (min-width: 768px)\r\n.container {\r\n width: 750px;\r\n}\r\n.container {\r\n padding-right: 15px;\r\n padding-left: 15px;\r\n margin-right: auto;\r\n margin-left: auto;\r\n}\r\n* {\r\n border-radius: 0;\r\n}\r\n\r\n\r\n/*Business Posts Section\r\n------------------------------------*/\r\n.business-post-section {\r\n position: relative;\r\n background: #f7f7f7;\r\n}\r\n\r\n\r\n/*Business Posts Links/Buttons\r\n------------------------------------*/\r\n.business-post-link {\r\n padding-top: 25px;"} +{"output_text": "()\nexport class UserService {\n private _user: any;\n private _tempuser: any;\n\n constructor(private http: Http) { }\n\n setUser(user: any) {\n this._user = user;\n this._tempuser = user;\n }\n\n getUser(): any {\n return this._user;\n }\n\n getTempUser(): any {\n return this._tempuser;\n ", "input_text": ".Sum(time => time.Hours),\n TotalTrav = employeeTimes.Where(time => time.PayCode == \"214\")\n .Sum(time => time.Hours),\n TotalSub = employeeTimes.Where(time => time.PayCode == \"251\")\n .Sum(time => time.Hours),\n FName = employee.FName,\n LName = employee.LName,\n Div = employee.Dept,\n AccID = employee.AccountingID,\n Period = employeeTimes.Key.Period,\n Status = employeeTimes.First().Status,\n };\n\n Q:\n\nData saved in service lost on page refresh or change\n\nI have an Angular 5 application, and I'm attempting to use a Google OAuth login to get a username and then set that username in a service to be used as the logged in user. Before I added the login I set the value manually in the service and it worked with no issue. Now I'm setting the username from the Google login it looks like the value I set is being lost each time the page refreshes (or when a new instance of the service is called).\nThe value returns correctly from the Google login (I checked in the console), so I know everything is ok there. My impression of Angular services was that they were constant across the other modules? Is it the case that each time I call that service it's creating a new empty 'tempuser' variable? If so is there any way around this so I can keep the value across the whole application until the user logs out?\nThis is the service itself:\nimport { Injectable } from '@angular/core';\nimport { Http, Response, Headers } from \"@angular/http\";\n\n@Injectable"} +{"output_text": " page is parsed. This is the best way to do it.\nNow, if you want to bind to the button click, you can do it like this...\n $(document).ready(function(){\n $('.testMe').click(function(){\n var URL=\"${createLink(controller:'book',action:'checkJquery')}\";\n\n $.ajax({\n url:URL,\n data: {id:'1'},\n success: function(", "input_text": " until the page's body is parsed and in the dom.\nNow lets look at the html.\n\n\nThis is defining a button that will call the callAjax() method when it is clicked. So lets follow the logic. You create your function that will be executed later. Your page is rendered and the button exists.\nYou click the button which executes the method. That method then gives the function to the document ready to wait for the page to be parsed. But... we already know it is. Cause you called it based on an interaction with the page. So the document ready is pointless.\nAnother point, that call is going to happen -every- time that button is clicked. Meaning your method will happen multiple times, which means your binding will happen multiple times, and so on and so forth.\nYou should really consider binding in your javascript instead of inline in order to separate your concerns and to minimize/eliminate the redundancy.\nSo first off the html would change to be something like..\n\n\nAnd your javascript...\n $(document).ready(function(){\n $('.testMe').click(function(){\n var URL=\"${createLink(controller:'book',action:'checkJquery')}\";\n\n $.ajax({\n url:URL,\n data: {id:'1'},\n success: function(resp){\n console.log(resp);\n $(\"#author\").val(resp.author);\n $(\"#book\").val(resp.bookName);\n }\n });\n });\n });\n\nNow your markup would be only your markup, and your bindings would happen after the"} +{"output_text": "\n\n public void buildChildMenus() {\n JMenu menu = new JMenu(\"Window\");\n menu.add(cascade);\n menu.add(tile);\n add(menu);\n }\n}\n\n/**\n * A simple MDI application that demonstrates the use of the MDI\n * framework.\n */\n class MDI extends JFrame {\n private MDIDesktopPane desktop;\n private JMenuBar menuBar", "input_text": "sets.right, d.getHeight()\n - scrollInsets.top - scrollInsets.bottom);\n }\n\n if (x <= d.getWidth())\n x = ((int) d.getWidth()) - 20;\n if (y <= d.getHeight())\n y = ((int) d.getHeight()) - 20;\n desktop.setAllSize(x, y);\n scrollPane.invalidate();\n scrollPane.validate();\n }\n }\n}\n\n/**\n * Menu component that handles the functionality expected of a standard\n * \"Windows\" menu for MDI applications.\n */\n class WindowMenu extends JMenu {\n private MDIDesktopPane desktop;\n private JMenuItem cascade = new JMenuItem(\"Cascade\");\n private JMenuItem tile = new JMenuItem(\"Tile\");\n\n public WindowMenu(MDIDesktopPane desktop) {\n this.desktop = desktop;\n setText(\"Window\");\n cascade.addActionListener(new ActionListener() {\n public void actionPerformed(ActionEvent ae) {\n WindowMenu.this.desktop.cascadeFrames();\n }\n });\n tile.addActionListener(new ActionListener() {\n public void actionPerformed(ActionEvent ae) {\n WindowMenu.this.desktop.tileFrames();\n }\n });\n addMenuListener(new MenuListener() {\n public void menuCanceled(MenuEvent e) {\n }\n\n public void menuDeselected(MenuEvent e) {\n removeAll();\n }\n\n public void menuSelected(MenuEvent e) {\n buildChildMenus();\n }\n});\n }"} +{"output_text": "=\"table table-striped\">\n \n \n Name\n Description\n Price\n Actions\n \n \n \n \n {{ item.name }}\n <", "input_text": "\nEnable the use of the sensor, if necessary, in the firmware (either by the UI if possible or uploading new firmware) and;\n\nCheck whether the CR-10 controller board uses active LOW to detect a lack of filament (or again, this may be a UI, or firmware, setting)\nYou could test this last point yourself, by applying a LOW level voltage (i.e. grounding) to X+ endstop input on the control board (after checking the UI/firmware for appropriate settings), and see if that halts/pauses the print. If not, try a HIGH level (5\u00a0V). If it is active HIGH then you simply need to use an inverter (i.e. 4069), between the sensor and the X+ endstop connector.\n\nBoard examples\nThere are two principle board options, v1.x and v2.x, the latter has dual Z stepper motor connectors and the former does not.For example:\nVersion 1.x, for CR-10 firmware - Motherboard Controller DIY Creality 3D\u00ae CR-10 / CR-10S 3D Printer CR-10 Upgrade Control Board12V (CR-10S Mainboard)\n\nVersion 2.x (2.1 in this case) with Dual Z stepper connectors, for CR-10S firmware - Luxnwatts CR-10S Mainboard Replacement Controller Board Upgrade V2.1 Motherboard For Creality S4 S5 3D Printer\n\n Q:\n\nError in filter function for VueJs\n\nI want to filter the following in VueJs 2.\nMy Component is the following:\n\n\ndetails.js\n ready: function() {\n this.$.query.on('value', function(snapshot) {\n var item = snapshot.val();\n", "input_text": "
    Date Modified: {{item.dateModified}}
    \n
    Status: {{item.status}}
    \n
    \n\n\nPolymer({\n is:'my-view1',\n properties: {\n data: {\n notify: true,\n type: Object,\n observer: 'dataChanged'\n }\n },\n dataChanged: function (newData, oldData) {\n console.log(newData[0]);\n // do something when the query returns values?\n },\n transactionCompleted: new Promise(function(resolve, reject) {\n // how can I access \"data\" here? \n })`\n\nA:\n\nI wound up going another way entirely, which seemed to be a cleaner approach to what I was doing anyways. I broke it down into separate components. This way when the detail component was loaded, the ready function would allow me to adjust the data before it got displayed:\nlist.html:\n\n\n\n\n\ndetails.html\n