Model save

Browse files

Files changed (3) hide show

README.md +138 -0
adapter_config.json +29 -0
adapter_model.safetensors +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,138 @@

+---
+library_name: peft
+base_model: peiyi9979/math-shepherd-mistral-7b-prm
+tags:
+- generated_from_trainer
+metrics:
+- accuracy
+- precision
+- recall
+- f1
+model-index:
+- name: v1_mistral_lora
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# v1_mistral_lora
+This model is a fine-tuned version of [peiyi9979/math-shepherd-mistral-7b-prm](https://huggingface.co/peiyi9979/math-shepherd-mistral-7b-prm) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.2947
+- Accuracy: 0.8899
+- Precision: 0.8933
+- Recall: 0.7910
+- F1: 0.8391
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 2e-05
+- train_batch_size: 8
+- eval_batch_size: 8
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 4
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 64
+- total_eval_batch_size: 32
+- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.1
+- num_epochs: 1
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss | Accuracy | Precision | Recall | F1     |
+|:-------------:|:------:|:----:|:---------------:|:--------:|:---------:|:------:|:------:|
+| No log        | 0      | 0    | 0.6321          | 0.6480   | 0.5125    | 0.6119 | 0.5578 |
+| 0.6598        | 0.0153 | 20   | 0.6284          | 0.6552   | 0.5221    | 0.5871 | 0.5527 |
+| 0.6948        | 0.0306 | 40   | 0.6222          | 0.6787   | 0.5742    | 0.4428 | 0.5    |
+| 0.6394        | 0.0459 | 60   | 0.6187          | 0.6877   | 0.6228    | 0.3532 | 0.4508 |
+| 0.6466        | 0.0612 | 80   | 0.5946          | 0.7148   | 0.6257    | 0.5323 | 0.5753 |
+| 0.5551        | 0.0765 | 100  | 0.5566          | 0.7256   | 0.6140    | 0.6567 | 0.6346 |
+| 0.5631        | 0.0918 | 120  | 0.4903          | 0.7924   | 0.75      | 0.6418 | 0.6917 |
+| 0.5009        | 0.1072 | 140  | 0.4552          | 0.7978   | 0.7432    | 0.6766 | 0.7083 |
+| 0.4532        | 0.1225 | 160  | 0.4340          | 0.8267   | 0.8344    | 0.6517 | 0.7318 |
+| 0.3813        | 0.1378 | 180  | 0.4414          | 0.8285   | 0.8630    | 0.6269 | 0.7262 |
+| 0.3897        | 0.1531 | 200  | 0.4202          | 0.8394   | 0.8784    | 0.6468 | 0.7450 |
+| 0.427         | 0.1684 | 220  | 0.4066          | 0.8430   | 0.8654    | 0.6716 | 0.7563 |
+| 0.346         | 0.1837 | 240  | 0.4156          | 0.8339   | 0.7685    | 0.7761 | 0.7723 |
+| 0.3623        | 0.1990 | 260  | 0.4000          | 0.8502   | 0.8734    | 0.6866 | 0.7688 |
+| 0.3446        | 0.2143 | 280  | 0.3941          | 0.8520   | 0.8650    | 0.7015 | 0.7747 |
+| 0.2533        | 0.2296 | 300  | 0.3808          | 0.8556   | 0.8954    | 0.6816 | 0.7740 |
+| 0.3451        | 0.2449 | 320  | 0.3897          | 0.8357   | 0.7895    | 0.7463 | 0.7673 |
+| 0.3667        | 0.2602 | 340  | 0.3895          | 0.8375   | 0.7761    | 0.7761 | 0.7761 |
+| 0.3378        | 0.2755 | 360  | 0.3691          | 0.8592   | 0.8773    | 0.7114 | 0.7857 |
+| 0.3216        | 0.2909 | 380  | 0.3751          | 0.8394   | 0.7947    | 0.7512 | 0.7724 |
+| 0.3109        | 0.3062 | 400  | 0.3736          | 0.8538   | 0.8093    | 0.7811 | 0.7949 |
+| 0.2893        | 0.3215 | 420  | 0.3466          | 0.8664   | 0.8802    | 0.7313 | 0.7989 |
+| 0.3635        | 0.3368 | 440  | 0.3490          | 0.8610   | 0.8523    | 0.7463 | 0.7958 |
+| 0.3582        | 0.3521 | 460  | 0.3370          | 0.8718   | 0.8824    | 0.7463 | 0.8086 |
+| 0.3879        | 0.3674 | 480  | 0.3521          | 0.8556   | 0.7980    | 0.8060 | 0.8020 |
+| 0.3741        | 0.3827 | 500  | 0.3298          | 0.8682   | 0.8810    | 0.7363 | 0.8022 |
+| 0.3291        | 0.3980 | 520  | 0.3347          | 0.8628   | 0.8743    | 0.7264 | 0.7935 |
+| 0.3697        | 0.4133 | 540  | 0.3236          | 0.8682   | 0.8636    | 0.7562 | 0.8064 |
+| 0.3143        | 0.4286 | 560  | 0.3294          | 0.8628   | 0.8571    | 0.7463 | 0.7979 |
+| 0.2442        | 0.4439 | 580  | 0.3167          | 0.8700   | 0.8909    | 0.7313 | 0.8033 |
+| 0.361         | 0.4592 | 600  | 0.3247          | 0.8664   | 0.8360    | 0.7861 | 0.8103 |
+| 0.3877        | 0.4746 | 620  | 0.3325          | 0.8700   | 0.8342    | 0.8010 | 0.8173 |
+| 0.2342        | 0.4899 | 640  | 0.3178          | 0.8736   | 0.8659    | 0.7711 | 0.8158 |
+| 0.2483        | 0.5052 | 660  | 0.3146          | 0.8718   | 0.8963    | 0.7313 | 0.8055 |
+| 0.2841        | 0.5205 | 680  | 0.3226          | 0.8718   | 0.9167    | 0.7114 | 0.8011 |
+| 0.3065        | 0.5358 | 700  | 0.3122          | 0.8845   | 0.9363    | 0.7313 | 0.8212 |
+| 0.2231        | 0.5511 | 720  | 0.3075          | 0.8809   | 0.8689    | 0.7910 | 0.8281 |
+| 0.2701        | 0.5664 | 740  | 0.3041          | 0.8809   | 0.8814    | 0.7761 | 0.8254 |
+| 0.263         | 0.5817 | 760  | 0.3054          | 0.8773   | 0.8674    | 0.7811 | 0.8220 |
+| 0.3769        | 0.5970 | 780  | 0.3036          | 0.8755   | 0.8708    | 0.7711 | 0.8179 |
+| 0.184         | 0.6123 | 800  | 0.3055          | 0.8755   | 0.8511    | 0.7960 | 0.8226 |
+| 0.3339        | 0.6276 | 820  | 0.3079          | 0.8773   | 0.8482    | 0.8060 | 0.8265 |
+| 0.2078        | 0.6429 | 840  | 0.3000          | 0.8827   | 0.8736    | 0.7910 | 0.8303 |
+| 0.3542        | 0.6582 | 860  | 0.3014          | 0.8827   | 0.8778    | 0.7861 | 0.8294 |
+| 0.2316        | 0.6736 | 880  | 0.3074          | 0.8755   | 0.8587    | 0.7861 | 0.8208 |
+| 0.2983        | 0.6889 | 900  | 0.3038          | 0.8809   | 0.8771    | 0.7811 | 0.8263 |
+| 0.3039        | 0.7042 | 920  | 0.3024          | 0.8845   | 0.8870    | 0.7811 | 0.8307 |
+| 0.311         | 0.7195 | 940  | 0.3016          | 0.8827   | 0.8820    | 0.7811 | 0.8285 |
+| 0.406         | 0.7348 | 960  | 0.3040          | 0.8827   | 0.8617    | 0.8060 | 0.8329 |
+| 0.2306        | 0.7501 | 980  | 0.2975          | 0.8863   | 0.9059    | 0.7662 | 0.8302 |
+| 0.3494        | 0.7654 | 1000 | 0.3009          | 0.8863   | 0.875     | 0.8010 | 0.8364 |
+| 0.3237        | 0.7807 | 1020 | 0.3034          | 0.8899   | 0.8723    | 0.8159 | 0.8432 |
+| 0.4034        | 0.7960 | 1040 | 0.2988          | 0.8899   | 0.8977    | 0.7861 | 0.8382 |
+| 0.2682        | 0.8113 | 1060 | 0.3001          | 0.8845   | 0.8663    | 0.8060 | 0.8351 |
+| 0.2921        | 0.8266 | 1080 | 0.2982          | 0.8845   | 0.8785    | 0.7910 | 0.8325 |
+| 0.3732        | 0.8419 | 1100 | 0.3003          | 0.8791   | 0.8564    | 0.8010 | 0.8278 |
+| 0.324         | 0.8573 | 1120 | 0.2997          | 0.8845   | 0.8743    | 0.7960 | 0.8333 |
+| 0.3607        | 0.8726 | 1140 | 0.2987          | 0.8827   | 0.8736    | 0.7910 | 0.8303 |
+| 0.2201        | 0.8879 | 1160 | 0.2960          | 0.8881   | 0.8883    | 0.7910 | 0.8368 |
+| 0.2767        | 0.9032 | 1180 | 0.2949          | 0.8899   | 0.8933    | 0.7910 | 0.8391 |
+| 0.2563        | 0.9185 | 1200 | 0.2939          | 0.8899   | 0.8933    | 0.7910 | 0.8391 |
+| 0.2681        | 0.9338 | 1220 | 0.2956          | 0.8899   | 0.8933    | 0.7910 | 0.8391 |
+| 0.3409        | 0.9491 | 1240 | 0.2950          | 0.8881   | 0.8883    | 0.7910 | 0.8368 |
+| 0.3316        | 0.9644 | 1260 | 0.2939          | 0.8899   | 0.8933    | 0.7910 | 0.8391 |
+| 0.1957        | 0.9797 | 1280 | 0.2946          | 0.8899   | 0.8933    | 0.7910 | 0.8391 |
+| 0.2439        | 0.9950 | 1300 | 0.2947          | 0.8899   | 0.8933    | 0.7910 | 0.8391 |
+### Framework versions
+- PEFT 0.13.2
+- Transformers 4.46.0
+- Pytorch 2.5.1+cu124
+- Datasets 3.1.0
+- Tokenizers 0.20.3

adapter_config.json ADDED Viewed

	@@ -0,0 +1,29 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "peiyi9979/math-shepherd-mistral-7b-prm",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "v_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:68265fcd49073c4c7d9863237786db937f9103f7fd793a07cee72963f908c557
+size 27280152