zodiache commited on
Commit
56e9896
·
verified ·
1 Parent(s): 24f4ebb

Model save

Browse files
Files changed (4) hide show
  1. README.md +41 -41
  2. adapter_model.safetensors +1 -1
  3. all_results.json +6 -6
  4. train_results.json +6 -6
README.md CHANGED
@@ -18,7 +18,7 @@ should probably proofread and complete it, then remove this comment. -->
18
 
19
  This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) on the None dataset.
20
  It achieves the following results on the evaluation set:
21
- - Loss: 0.1095
22
 
23
  ## Model description
24
 
@@ -52,46 +52,46 @@ The following hyperparameters were used during training:
52
 
53
  | Training Loss | Epoch | Step | Validation Loss |
54
  |:-------------:|:------:|:----:|:---------------:|
55
- | 0.1498 | 0.1110 | 100 | 0.1544 |
56
- | 0.0879 | 0.2220 | 200 | 0.1207 |
57
- | 0.1002 | 0.3330 | 300 | 0.1030 |
58
- | 0.0539 | 0.4440 | 400 | 0.0757 |
59
- | 0.0268 | 0.5550 | 500 | 0.1053 |
60
- | 0.0411 | 0.6660 | 600 | 0.0819 |
61
- | 0.0511 | 0.7770 | 700 | 0.0916 |
62
- | 0.0371 | 0.8880 | 800 | 0.0793 |
63
- | 0.0336 | 0.9990 | 900 | 0.0845 |
64
- | 0.038 | 1.1100 | 1000 | 0.0783 |
65
- | 0.0398 | 1.2210 | 1100 | 0.0818 |
66
- | 0.04 | 1.3320 | 1200 | 0.0668 |
67
- | 0.0324 | 1.4430 | 1300 | 0.0765 |
68
- | 0.025 | 1.5540 | 1400 | 0.0789 |
69
- | 0.0656 | 1.6650 | 1500 | 0.0971 |
70
- | 0.0324 | 1.7761 | 1600 | 0.0752 |
71
- | 0.0382 | 1.8871 | 1700 | 0.0748 |
72
- | 0.0765 | 1.9981 | 1800 | 0.0721 |
73
- | 0.0236 | 2.1091 | 1900 | 0.0854 |
74
- | 0.0204 | 2.2201 | 2000 | 0.0747 |
75
- | 0.0528 | 2.3311 | 2100 | 0.0868 |
76
- | 0.0222 | 2.4421 | 2200 | 0.0832 |
77
- | 0.0178 | 2.5531 | 2300 | 0.0770 |
78
- | 0.0284 | 2.6641 | 2400 | 0.0772 |
79
- | 0.0455 | 2.7751 | 2500 | 0.0854 |
80
- | 0.0168 | 2.8861 | 2600 | 0.0830 |
81
- | 0.0251 | 2.9971 | 2700 | 0.0913 |
82
- | 0.0242 | 3.1081 | 2800 | 0.1084 |
83
- | 0.0168 | 3.2191 | 2900 | 0.0974 |
84
- | 0.0096 | 3.3301 | 3000 | 0.1091 |
85
- | 0.0132 | 3.4411 | 3100 | 0.1040 |
86
- | 0.0182 | 3.5521 | 3200 | 0.0944 |
87
- | 0.0315 | 3.6631 | 3300 | 0.1017 |
88
- | 0.0162 | 3.7741 | 3400 | 0.0974 |
89
- | 0.0054 | 3.8851 | 3500 | 0.1025 |
90
- | 0.0144 | 3.9961 | 3600 | 0.1054 |
91
- | 0.0104 | 4.1071 | 3700 | 0.1084 |
92
- | 0.0138 | 4.2181 | 3800 | 0.1091 |
93
- | 0.0103 | 4.3291 | 3900 | 0.1092 |
94
- | 0.014 | 4.4401 | 4000 | 0.1095 |
95
 
96
 
97
  ### Framework versions
 
18
 
19
  This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) on the None dataset.
20
  It achieves the following results on the evaluation set:
21
+ - Loss: 0.4651
22
 
23
  ## Model description
24
 
 
52
 
53
  | Training Loss | Epoch | Step | Validation Loss |
54
  |:-------------:|:------:|:----:|:---------------:|
55
+ | 0.0811 | 0.1892 | 100 | 0.2884 |
56
+ | 0.0461 | 0.3784 | 200 | 0.1991 |
57
+ | 0.0421 | 0.5676 | 300 | 0.2228 |
58
+ | 0.0307 | 0.7569 | 400 | 0.1926 |
59
+ | 0.0729 | 0.9461 | 500 | 0.1457 |
60
+ | 0.0344 | 1.1353 | 600 | 0.1295 |
61
+ | 0.0257 | 1.3245 | 700 | 0.2353 |
62
+ | 0.0379 | 1.5137 | 800 | 0.2110 |
63
+ | 0.0247 | 1.7029 | 900 | 0.1904 |
64
+ | 0.0018 | 1.8921 | 1000 | 0.2594 |
65
+ | 0.0115 | 2.0814 | 1100 | 0.2567 |
66
+ | 0.0059 | 2.2706 | 1200 | 0.2599 |
67
+ | 0.0225 | 2.4598 | 1300 | 0.2947 |
68
+ | 0.0149 | 2.6490 | 1400 | 0.2559 |
69
+ | 0.0293 | 2.8382 | 1500 | 0.2606 |
70
+ | 0.0026 | 3.0274 | 1600 | 0.2469 |
71
+ | 0.0145 | 3.2167 | 1700 | 0.2146 |
72
+ | 0.0004 | 3.4059 | 1800 | 0.3081 |
73
+ | 0.0117 | 3.5951 | 1900 | 0.3059 |
74
+ | 0.0207 | 3.7843 | 2000 | 0.3001 |
75
+ | 0.0061 | 3.9735 | 2100 | 0.3827 |
76
+ | 0.0072 | 4.1627 | 2200 | 0.3541 |
77
+ | 0.0348 | 4.3519 | 2300 | 0.3904 |
78
+ | 0.0019 | 4.5412 | 2400 | 0.3549 |
79
+ | 0.0031 | 4.7304 | 2500 | 0.3791 |
80
+ | 0.0009 | 4.9196 | 2600 | 0.4193 |
81
+ | 0.0011 | 5.1088 | 2700 | 0.4539 |
82
+ | 0.0251 | 5.2980 | 2800 | 0.4403 |
83
+ | 0.0008 | 5.4872 | 2900 | 0.4527 |
84
+ | 0.0085 | 5.6764 | 3000 | 0.4156 |
85
+ | 0.0013 | 5.8657 | 3100 | 0.4183 |
86
+ | 0.0007 | 6.0549 | 3200 | 0.4241 |
87
+ | 0.0025 | 6.2441 | 3300 | 0.4420 |
88
+ | 0.0029 | 6.4333 | 3400 | 0.4514 |
89
+ | 0.0041 | 6.6225 | 3500 | 0.4619 |
90
+ | 0.0009 | 6.8117 | 3600 | 0.4452 |
91
+ | 0.0001 | 7.0009 | 3700 | 0.4656 |
92
+ | 0.0007 | 7.1902 | 3800 | 0.4603 |
93
+ | 0.0014 | 7.3794 | 3900 | 0.4651 |
94
+ | 0.0175 | 7.5686 | 4000 | 0.4651 |
95
 
96
 
97
  ### Framework versions
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:0d6a5c0d7e838d834ec996a6b55bd31a86a108b793eeb7f291e9e0bc0efcf3d9
3
  size 2115012328
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f9415f3a23a12844000b44f2bf763a6db9aef6a37cce0e9450206eaa07c6c811
3
  size 2115012328
all_results.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
- "epoch": 4.546690717358124,
3
- "total_flos": 3.132739906912813e+18,
4
- "train_loss": 0.0763904440615022,
5
- "train_runtime": 34770.128,
6
- "train_samples_per_second": 7.539,
7
- "train_steps_per_second": 0.118
8
  }
 
1
  {
2
+ "epoch": 7.750236518448439,
3
+ "total_flos": 3.618461896742535e+18,
4
+ "train_loss": 0.06011972692103562,
5
+ "train_runtime": 39228.4809,
6
+ "train_samples_per_second": 6.682,
7
+ "train_steps_per_second": 0.104
8
  }
train_results.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
- "epoch": 4.546690717358124,
3
- "total_flos": 3.132739906912813e+18,
4
- "train_loss": 0.0763904440615022,
5
- "train_runtime": 34770.128,
6
- "train_samples_per_second": 7.539,
7
- "train_steps_per_second": 0.118
8
  }
 
1
  {
2
+ "epoch": 7.750236518448439,
3
+ "total_flos": 3.618461896742535e+18,
4
+ "train_loss": 0.06011972692103562,
5
+ "train_runtime": 39228.4809,
6
+ "train_samples_per_second": 6.682,
7
+ "train_steps_per_second": 0.104
8
  }