Training in progress, step 50, checkpoint

Browse files

Files changed (11) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +31 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +30 -0
last-checkpoint/tokenizer.json +3 -0
last-checkpoint/tokenizer_config.json +48 -0
last-checkpoint/trainer_state.json +408 -0
last-checkpoint/training_args.bin +3 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: bigscience/bloom-560m
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "bigscience/bloom-560m",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 128,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 64,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "query_key_value",
+    "dense",
+    "dense_4h_to_h",
+    "dense_h_to_4h"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:27df9bde2504b07a4433373d117fe0dcf1f8f68461037f8d783b83bfd88ba700
+size 100690288

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e84acff8bd7167a5e2d3133a7649246251276308539c18cf0db18bd9cacb6546
+size 51344890

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3a738ebc0a83a29aa92fffb459fdeca3e0d70407ae8195022a12ab02632a58af
+size 14244

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0c0ba5c36022182918d871b3b9db68399242f10f6adc246d0190769469e50e3a
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d963066d6adae5034a1dc114c3ac444512de09928cf14ed4562ba94d9a440e66
+size 21763085

last-checkpoint/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,48 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "merges_file": null,
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "<pad>",
+  "padding_side": "left",
+  "tokenizer_class": "BloomTokenizer",
+  "unk_token": "<unk>",
+  "vocab_file": null
+}

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,408 @@

+{
+  "best_metric": 1.4904803037643433,
+  "best_model_checkpoint": "miner_id_24/checkpoint-50",
+  "epoch": 0.044365572315882874,
+  "eval_steps": 50,
+  "global_step": 50,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0008873114463176575,
+      "grad_norm": 1.4210379123687744,
+      "learning_rate": 1.009e-05,
+      "loss": 2.2728,
+      "step": 1
+    },
+    {
+      "epoch": 0.0008873114463176575,
+      "eval_loss": 3.799154758453369,
+      "eval_runtime": 24.9016,
+      "eval_samples_per_second": 19.075,
+      "eval_steps_per_second": 4.779,
+      "step": 1
+    },
+    {
+      "epoch": 0.001774622892635315,
+      "grad_norm": 1.4995695352554321,
+      "learning_rate": 2.018e-05,
+      "loss": 2.4509,
+      "step": 2
+    },
+    {
+      "epoch": 0.0026619343389529724,
+      "grad_norm": 2.3533148765563965,
+      "learning_rate": 3.027e-05,
+      "loss": 3.6053,
+      "step": 3
+    },
+    {
+      "epoch": 0.00354924578527063,
+      "grad_norm": 2.306748628616333,
+      "learning_rate": 4.036e-05,
+      "loss": 3.5041,
+      "step": 4
+    },
+    {
+      "epoch": 0.0044365572315882874,
+      "grad_norm": 2.498638153076172,
+      "learning_rate": 5.045e-05,
+      "loss": 2.7132,
+      "step": 5
+    },
+    {
+      "epoch": 0.005323868677905945,
+      "grad_norm": 2.506669044494629,
+      "learning_rate": 6.054e-05,
+      "loss": 3.3575,
+      "step": 6
+    },
+    {
+      "epoch": 0.006211180124223602,
+      "grad_norm": 2.9541540145874023,
+      "learning_rate": 7.062999999999999e-05,
+      "loss": 3.1319,
+      "step": 7
+    },
+    {
+      "epoch": 0.00709849157054126,
+      "grad_norm": 2.9361417293548584,
+      "learning_rate": 8.072e-05,
+      "loss": 3.3045,
+      "step": 8
+    },
+    {
+      "epoch": 0.007985803016858917,
+      "grad_norm": 4.139509677886963,
+      "learning_rate": 9.081e-05,
+      "loss": 4.4589,
+      "step": 9
+    },
+    {
+      "epoch": 0.008873114463176575,
+      "grad_norm": 4.182487487792969,
+      "learning_rate": 0.0001009,
+      "loss": 2.9196,
+      "step": 10
+    },
+    {
+      "epoch": 0.009760425909494233,
+      "grad_norm": 4.3015875816345215,
+      "learning_rate": 0.00010036894736842106,
+      "loss": 4.3239,
+      "step": 11
+    },
+    {
+      "epoch": 0.01064773735581189,
+      "grad_norm": 7.319455623626709,
+      "learning_rate": 9.98378947368421e-05,
+      "loss": 5.8161,
+      "step": 12
+    },
+    {
+      "epoch": 0.011535048802129548,
+      "grad_norm": 5.4054059982299805,
+      "learning_rate": 9.930684210526315e-05,
+      "loss": 4.2783,
+      "step": 13
+    },
+    {
+      "epoch": 0.012422360248447204,
+      "grad_norm": 5.388209819793701,
+      "learning_rate": 9.877578947368421e-05,
+      "loss": 3.6564,
+      "step": 14
+    },
+    {
+      "epoch": 0.013309671694764862,
+      "grad_norm": 5.456456661224365,
+      "learning_rate": 9.824473684210527e-05,
+      "loss": 3.1307,
+      "step": 15
+    },
+    {
+      "epoch": 0.01419698314108252,
+      "grad_norm": 6.29660177230835,
+      "learning_rate": 9.771368421052632e-05,
+      "loss": 3.0369,
+      "step": 16
+    },
+    {
+      "epoch": 0.015084294587400177,
+      "grad_norm": 6.667815208435059,
+      "learning_rate": 9.718263157894736e-05,
+      "loss": 3.0679,
+      "step": 17
+    },
+    {
+      "epoch": 0.015971606033717833,
+      "grad_norm": 7.545135498046875,
+      "learning_rate": 9.665157894736842e-05,
+      "loss": 4.7117,
+      "step": 18
+    },
+    {
+      "epoch": 0.01685891748003549,
+      "grad_norm": 6.274733066558838,
+      "learning_rate": 9.612052631578948e-05,
+      "loss": 3.1135,
+      "step": 19
+    },
+    {
+      "epoch": 0.01774622892635315,
+      "grad_norm": 6.953660011291504,
+      "learning_rate": 9.558947368421052e-05,
+      "loss": 3.1242,
+      "step": 20
+    },
+    {
+      "epoch": 0.018633540372670808,
+      "grad_norm": 8.701539993286133,
+      "learning_rate": 9.505842105263159e-05,
+      "loss": 3.7078,
+      "step": 21
+    },
+    {
+      "epoch": 0.019520851818988466,
+      "grad_norm": 13.722517967224121,
+      "learning_rate": 9.452736842105263e-05,
+      "loss": 4.795,
+      "step": 22
+    },
+    {
+      "epoch": 0.02040816326530612,
+      "grad_norm": 11.242803573608398,
+      "learning_rate": 9.399631578947368e-05,
+      "loss": 4.7846,
+      "step": 23
+    },
+    {
+      "epoch": 0.02129547471162378,
+      "grad_norm": 10.049261093139648,
+      "learning_rate": 9.346526315789474e-05,
+      "loss": 3.9411,
+      "step": 24
+    },
+    {
+      "epoch": 0.022182786157941437,
+      "grad_norm": 12.361631393432617,
+      "learning_rate": 9.293421052631578e-05,
+      "loss": 3.6075,
+      "step": 25
+    },
+    {
+      "epoch": 0.023070097604259095,
+      "grad_norm": 9.290053367614746,
+      "learning_rate": 9.240315789473684e-05,
+      "loss": 3.9432,
+      "step": 26
+    },
+    {
+      "epoch": 0.023957409050576754,
+      "grad_norm": 14.21732234954834,
+      "learning_rate": 9.18721052631579e-05,
+      "loss": 4.1622,
+      "step": 27
+    },
+    {
+      "epoch": 0.024844720496894408,
+      "grad_norm": 13.092656135559082,
+      "learning_rate": 9.134105263157895e-05,
+      "loss": 4.1081,
+      "step": 28
+    },
+    {
+      "epoch": 0.025732031943212066,
+      "grad_norm": 12.007771492004395,
+      "learning_rate": 9.081e-05,
+      "loss": 3.3797,
+      "step": 29
+    },
+    {
+      "epoch": 0.026619343389529725,
+      "grad_norm": 11.524943351745605,
+      "learning_rate": 9.027894736842105e-05,
+      "loss": 3.3453,
+      "step": 30
+    },
+    {
+      "epoch": 0.027506654835847383,
+      "grad_norm": 7.809002876281738,
+      "learning_rate": 8.97478947368421e-05,
+      "loss": 3.1803,
+      "step": 31
+    },
+    {
+      "epoch": 0.02839396628216504,
+      "grad_norm": 12.06727409362793,
+      "learning_rate": 8.921684210526316e-05,
+      "loss": 3.955,
+      "step": 32
+    },
+    {
+      "epoch": 0.029281277728482696,
+      "grad_norm": 7.961983680725098,
+      "learning_rate": 8.86857894736842e-05,
+      "loss": 2.466,
+      "step": 33
+    },
+    {
+      "epoch": 0.030168589174800354,
+      "grad_norm": 9.351600646972656,
+      "learning_rate": 8.815473684210527e-05,
+      "loss": 2.7989,
+      "step": 34
+    },
+    {
+      "epoch": 0.031055900621118012,
+      "grad_norm": 9.18805980682373,
+      "learning_rate": 8.762368421052631e-05,
+      "loss": 2.0216,
+      "step": 35
+    },
+    {
+      "epoch": 0.03194321206743567,
+      "grad_norm": 11.7416353225708,
+      "learning_rate": 8.709263157894737e-05,
+      "loss": 3.1537,
+      "step": 36
+    },
+    {
+      "epoch": 0.032830523513753325,
+      "grad_norm": 14.017309188842773,
+      "learning_rate": 8.656157894736843e-05,
+      "loss": 2.7026,
+      "step": 37
+    },
+    {
+      "epoch": 0.03371783496007098,
+      "grad_norm": 8.739339828491211,
+      "learning_rate": 8.603052631578947e-05,
+      "loss": 1.5622,
+      "step": 38
+    },
+    {
+      "epoch": 0.03460514640638864,
+      "grad_norm": 8.707830429077148,
+      "learning_rate": 8.549947368421052e-05,
+      "loss": 1.5255,
+      "step": 39
+    },
+    {
+      "epoch": 0.0354924578527063,
+      "grad_norm": 8.397041320800781,
+      "learning_rate": 8.496842105263158e-05,
+      "loss": 1.6915,
+      "step": 40
+    },
+    {
+      "epoch": 0.03637976929902396,
+      "grad_norm": 10.48680305480957,
+      "learning_rate": 8.443736842105264e-05,
+      "loss": 1.54,
+      "step": 41
+    },
+    {
+      "epoch": 0.037267080745341616,
+      "grad_norm": 12.486153602600098,
+      "learning_rate": 8.390631578947369e-05,
+      "loss": 2.9895,
+      "step": 42
+    },
+    {
+      "epoch": 0.038154392191659274,
+      "grad_norm": 10.01754379272461,
+      "learning_rate": 8.337526315789473e-05,
+      "loss": 1.6676,
+      "step": 43
+    },
+    {
+      "epoch": 0.03904170363797693,
+      "grad_norm": 6.2602691650390625,
+      "learning_rate": 8.284421052631579e-05,
+      "loss": 1.0073,
+      "step": 44
+    },
+    {
+      "epoch": 0.03992901508429459,
+      "grad_norm": 8.267836570739746,
+      "learning_rate": 8.231315789473685e-05,
+      "loss": 1.4826,
+      "step": 45
+    },
+    {
+      "epoch": 0.04081632653061224,
+      "grad_norm": 8.214778900146484,
+      "learning_rate": 8.178210526315789e-05,
+      "loss": 1.6786,
+      "step": 46
+    },
+    {
+      "epoch": 0.0417036379769299,
+      "grad_norm": 4.049341201782227,
+      "learning_rate": 8.125105263157894e-05,
+      "loss": 0.4999,
+      "step": 47
+    },
+    {
+      "epoch": 0.04259094942324756,
+      "grad_norm": 6.359563827514648,
+      "learning_rate": 8.072e-05,
+      "loss": 0.9773,
+      "step": 48
+    },
+    {
+      "epoch": 0.043478260869565216,
+      "grad_norm": 11.562561988830566,
+      "learning_rate": 8.018894736842106e-05,
+      "loss": 1.3988,
+      "step": 49
+    },
+    {
+      "epoch": 0.044365572315882874,
+      "grad_norm": 14.666047096252441,
+      "learning_rate": 7.965789473684211e-05,
+      "loss": 1.702,
+      "step": 50
+    },
+    {
+      "epoch": 0.044365572315882874,
+      "eval_loss": 1.4904803037643433,
+      "eval_runtime": 25.0573,
+      "eval_samples_per_second": 18.957,
+      "eval_steps_per_second": 4.749,
+      "step": 50
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 200,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 50,
+  "stateful_callbacks": {
+    "EarlyStoppingCallback": {
+      "args": {
+        "early_stopping_patience": 5,
+        "early_stopping_threshold": 0.0
+      },
+      "attributes": {
+        "early_stopping_patience_counter": 0
+      }
+    },
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 837005642563584.0,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:876200ce2ec7caf67d7f043e743a813dab87cc6aeb05a958e5731502c7423971
+size 6840