JT000 commited on
Commit
1e225d4
·
verified ·
1 Parent(s): b549b70

End of training

Browse files
Files changed (2) hide show
  1. README.md +44 -74
  2. model.safetensors +1 -1
README.md CHANGED
@@ -14,7 +14,7 @@ should probably proofread and complete it, then remove this comment. -->
14
 
15
  This model is a fine-tuned version of [uer/gpt2-chinese-cluecorpussmall](https://huggingface.co/uer/gpt2-chinese-cluecorpussmall) on an unknown dataset.
16
  It achieves the following results on the evaluation set:
17
- - Loss: 0.1190
18
 
19
  ## Model description
20
 
@@ -34,89 +34,59 @@ More information needed
34
 
35
  The following hyperparameters were used during training:
36
  - learning_rate: 2e-05
37
- - train_batch_size: 40
38
- - eval_batch_size: 40
39
  - seed: 42
40
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
41
  - lr_scheduler_type: linear
42
  - lr_scheduler_warmup_steps: 500
43
- - num_epochs: 70
44
  - mixed_precision_training: Native AMP
45
 
46
  ### Training results
47
 
48
  | Training Loss | Epoch | Step | Validation Loss |
49
  |:-------------:|:-----:|:----:|:---------------:|
50
- | No log | 1.0 | 10 | 0.7439 |
51
- | No log | 2.0 | 20 | 0.7049 |
52
- | No log | 3.0 | 30 | 0.6436 |
53
- | No log | 4.0 | 40 | 0.5676 |
54
- | No log | 5.0 | 50 | 0.4825 |
55
- | No log | 6.0 | 60 | 0.3811 |
56
- | No log | 7.0 | 70 | 0.2679 |
57
- | No log | 8.0 | 80 | 0.1747 |
58
- | No log | 9.0 | 90 | 0.1365 |
59
- | No log | 10.0 | 100 | 0.1270 |
60
- | No log | 11.0 | 110 | 0.1241 |
61
- | No log | 12.0 | 120 | 0.1226 |
62
- | No log | 13.0 | 130 | 0.1205 |
63
- | No log | 14.0 | 140 | 0.1195 |
64
- | No log | 15.0 | 150 | 0.1181 |
65
- | No log | 16.0 | 160 | 0.1166 |
66
- | No log | 17.0 | 170 | 0.1129 |
67
- | No log | 18.0 | 180 | 0.1144 |
68
- | No log | 19.0 | 190 | 0.1102 |
69
- | No log | 20.0 | 200 | 0.1084 |
70
- | No log | 21.0 | 210 | 0.1052 |
71
- | No log | 22.0 | 220 | 0.1056 |
72
- | No log | 23.0 | 230 | 0.1042 |
73
- | No log | 24.0 | 240 | 0.1052 |
74
- | No log | 25.0 | 250 | 0.1008 |
75
- | No log | 26.0 | 260 | 0.1008 |
76
- | No log | 27.0 | 270 | 0.1025 |
77
- | No log | 28.0 | 280 | 0.1010 |
78
- | No log | 29.0 | 290 | 0.0991 |
79
- | No log | 30.0 | 300 | 0.0993 |
80
- | No log | 31.0 | 310 | 0.1001 |
81
- | No log | 32.0 | 320 | 0.1026 |
82
- | No log | 33.0 | 330 | 0.0997 |
83
- | No log | 34.0 | 340 | 0.1006 |
84
- | No log | 35.0 | 350 | 0.1034 |
85
- | No log | 36.0 | 360 | 0.1021 |
86
- | No log | 37.0 | 370 | 0.1009 |
87
- | No log | 38.0 | 380 | 0.1025 |
88
- | No log | 39.0 | 390 | 0.1044 |
89
- | No log | 40.0 | 400 | 0.1046 |
90
- | No log | 41.0 | 410 | 0.1094 |
91
- | No log | 42.0 | 420 | 0.1066 |
92
- | No log | 43.0 | 430 | 0.1067 |
93
- | No log | 44.0 | 440 | 0.1103 |
94
- | No log | 45.0 | 450 | 0.1099 |
95
- | No log | 46.0 | 460 | 0.1077 |
96
- | No log | 47.0 | 470 | 0.1080 |
97
- | No log | 48.0 | 480 | 0.1109 |
98
- | No log | 49.0 | 490 | 0.1120 |
99
- | 0.1438 | 50.0 | 500 | 0.1159 |
100
- | 0.1438 | 51.0 | 510 | 0.1121 |
101
- | 0.1438 | 52.0 | 520 | 0.1130 |
102
- | 0.1438 | 53.0 | 530 | 0.1157 |
103
- | 0.1438 | 54.0 | 540 | 0.1153 |
104
- | 0.1438 | 55.0 | 550 | 0.1155 |
105
- | 0.1438 | 56.0 | 560 | 0.1145 |
106
- | 0.1438 | 57.0 | 570 | 0.1150 |
107
- | 0.1438 | 58.0 | 580 | 0.1151 |
108
- | 0.1438 | 59.0 | 590 | 0.1160 |
109
- | 0.1438 | 60.0 | 600 | 0.1181 |
110
- | 0.1438 | 61.0 | 610 | 0.1157 |
111
- | 0.1438 | 62.0 | 620 | 0.1166 |
112
- | 0.1438 | 63.0 | 630 | 0.1157 |
113
- | 0.1438 | 64.0 | 640 | 0.1173 |
114
- | 0.1438 | 65.0 | 650 | 0.1181 |
115
- | 0.1438 | 66.0 | 660 | 0.1171 |
116
- | 0.1438 | 67.0 | 670 | 0.1184 |
117
- | 0.1438 | 68.0 | 680 | 0.1189 |
118
- | 0.1438 | 69.0 | 690 | 0.1192 |
119
- | 0.1438 | 70.0 | 700 | 0.1190 |
120
 
121
 
122
  ### Framework versions
 
14
 
15
  This model is a fine-tuned version of [uer/gpt2-chinese-cluecorpussmall](https://huggingface.co/uer/gpt2-chinese-cluecorpussmall) on an unknown dataset.
16
  It achieves the following results on the evaluation set:
17
+ - Loss: 0.1152
18
 
19
  ## Model description
20
 
 
34
 
35
  The following hyperparameters were used during training:
36
  - learning_rate: 2e-05
37
+ - train_batch_size: 30
38
+ - eval_batch_size: 30
39
  - seed: 42
40
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
41
  - lr_scheduler_type: linear
42
  - lr_scheduler_warmup_steps: 500
43
+ - num_epochs: 40
44
  - mixed_precision_training: Native AMP
45
 
46
  ### Training results
47
 
48
  | Training Loss | Epoch | Step | Validation Loss |
49
  |:-------------:|:-----:|:----:|:---------------:|
50
+ | No log | 1.0 | 13 | 0.7769 |
51
+ | No log | 2.0 | 26 | 0.7101 |
52
+ | No log | 3.0 | 39 | 0.6086 |
53
+ | No log | 4.0 | 52 | 0.4720 |
54
+ | No log | 5.0 | 65 | 0.3012 |
55
+ | No log | 6.0 | 78 | 0.1616 |
56
+ | No log | 7.0 | 91 | 0.1318 |
57
+ | No log | 8.0 | 104 | 0.1268 |
58
+ | No log | 9.0 | 117 | 0.1236 |
59
+ | No log | 10.0 | 130 | 0.1225 |
60
+ | No log | 11.0 | 143 | 0.1218 |
61
+ | No log | 12.0 | 156 | 0.1216 |
62
+ | No log | 13.0 | 169 | 0.1178 |
63
+ | No log | 14.0 | 182 | 0.1169 |
64
+ | No log | 15.0 | 195 | 0.1161 |
65
+ | No log | 16.0 | 208 | 0.1137 |
66
+ | No log | 17.0 | 221 | 0.1138 |
67
+ | No log | 18.0 | 234 | 0.1155 |
68
+ | No log | 19.0 | 247 | 0.1094 |
69
+ | No log | 20.0 | 260 | 0.1100 |
70
+ | No log | 21.0 | 273 | 0.1067 |
71
+ | No log | 22.0 | 286 | 0.1117 |
72
+ | No log | 23.0 | 299 | 0.1089 |
73
+ | No log | 24.0 | 312 | 0.1060 |
74
+ | No log | 25.0 | 325 | 0.1090 |
75
+ | No log | 26.0 | 338 | 0.1057 |
76
+ | No log | 27.0 | 351 | 0.1055 |
77
+ | No log | 28.0 | 364 | 0.1087 |
78
+ | No log | 29.0 | 377 | 0.1112 |
79
+ | No log | 30.0 | 390 | 0.1074 |
80
+ | No log | 31.0 | 403 | 0.1108 |
81
+ | No log | 32.0 | 416 | 0.1160 |
82
+ | No log | 33.0 | 429 | 0.1172 |
83
+ | No log | 34.0 | 442 | 0.1125 |
84
+ | No log | 35.0 | 455 | 0.1157 |
85
+ | No log | 36.0 | 468 | 0.1180 |
86
+ | No log | 37.0 | 481 | 0.1157 |
87
+ | No log | 38.0 | 494 | 0.1137 |
88
+ | 0.1477 | 39.0 | 507 | 0.1139 |
89
+ | 0.1477 | 40.0 | 520 | 0.1152 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90
 
91
 
92
  ### Framework versions
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7eb8d3bc941c3caf31fe7bc3c82058b24556d8e31969c492f70b1a6212c90789
3
  size 408366800
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9ed59447dad566a6cf90033e312caf6c2b7872cc62a8b0b2b1358c8c65609abc
3
  size 408366800