Edit model card

gpt_train_12_256

This model is a fine-tuned version of openai-community/gpt2 on the gokuls/wiki_book_corpus_raw_dataset_tiny dataset. It achieves the following results on the evaluation set:

  • Loss: 9.6016
  • Accuracy: 0.0878

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 36
  • eval_batch_size: 36
  • seed: 10
  • distributed_type: multi-GPU
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 100
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Accuracy
10.875 0.0001 1 10.875 0.0031
10.875 0.0001 2 10.875 0.0031
10.875 0.0002 3 10.875 0.0031
10.875 0.0002 4 10.875 0.0031
10.8672 0.0003 5 10.875 0.0031
10.875 0.0003 6 10.875 0.0031
10.8672 0.0004 7 10.875 0.0031
10.875 0.0004 8 10.875 0.0031
10.875 0.0005 9 10.875 0.0031
10.875 0.0005 10 10.875 0.0031
10.875 0.0006 11 10.875 0.0031
10.875 0.0007 12 10.875 0.0031
10.875 0.0007 13 10.875 0.0031
10.875 0.0008 14 10.875 0.0031
10.875 0.0008 15 10.875 0.0031
10.875 0.0009 16 10.875 0.0031
10.8672 0.0009 17 10.875 0.0031
10.875 0.0010 18 10.8047 0.0103
10.8125 0.0010 19 10.75 0.0119
10.7578 0.0011 20 10.6953 0.0180
10.7188 0.0011 21 10.6562 0.0319
10.6719 0.0012 22 10.625 0.0470
10.6328 0.0013 23 10.5938 0.0530
10.6172 0.0013 24 10.5703 0.0542
10.5859 0.0014 25 10.5469 0.0543
10.5547 0.0014 26 10.5312 0.0540
10.5391 0.0015 27 10.5156 0.0534
10.5547 0.0015 28 10.5 0.0531
10.5156 0.0016 29 10.4844 0.0535
10.4844 0.0016 30 10.4766 0.0542
10.4844 0.0017 31 10.4609 0.0548
10.4766 0.0017 32 10.4531 0.0551
10.4766 0.0018 33 10.4453 0.0557
10.4531 0.0019 34 10.4375 0.0565
10.4453 0.0019 35 10.4297 0.0570
10.4375 0.0020 36 10.4219 0.0575
10.4375 0.0020 37 10.4141 0.0581
10.4453 0.0021 38 10.4141 0.0583
10.3984 0.0021 39 10.4062 0.0585
10.4141 0.0022 40 10.3984 0.0586
10.4062 0.0022 41 10.3906 0.0587
10.3984 0.0023 42 10.3906 0.0587
10.3906 0.0023 43 10.3828 0.0588
10.4062 0.0024 44 10.375 0.0591
10.375 0.0025 45 10.375 0.0592
10.3984 0.0025 46 10.3672 0.0592
10.3828 0.0026 47 10.3594 0.0593
10.375 0.0026 48 10.3516 0.0597
10.3594 0.0027 49 10.3516 0.0599
10.3516 0.0027 50 10.3438 0.0602
10.3438 0.0028 51 10.3359 0.0604
10.3516 0.0028 52 10.3281 0.0606
10.3594 0.0029 53 10.3281 0.0607
10.3438 0.0029 54 10.3203 0.0608
10.3281 0.0030 55 10.3125 0.0608
10.3281 0.0031 56 10.3125 0.0607
10.3281 0.0031 57 10.3047 0.0607
10.3438 0.0032 58 10.3047 0.0607
10.3125 0.0032 59 10.2969 0.0609
10.3203 0.0033 60 10.2969 0.0612
10.3125 0.0033 61 10.2891 0.0615
10.2969 0.0034 62 10.2812 0.0618
10.2891 0.0034 63 10.2812 0.0620
10.2969 0.0035 64 10.2734 0.0622
10.2891 0.0035 65 10.2734 0.0622
10.2734 0.0036 66 10.2656 0.0623
10.2656 0.0037 67 10.2656 0.0623
10.2656 0.0037 68 10.2578 0.0623
10.2578 0.0038 69 10.25 0.0622
10.25 0.0038 70 10.25 0.0622
10.2656 0.0039 71 10.2422 0.0623
10.2344 0.0039 72 10.2422 0.0626
10.2578 0.0040 73 10.2344 0.0629
10.2266 0.0040 74 10.2344 0.0632
10.2422 0.0041 75 10.2266 0.0633
10.2656 0.0041 76 10.2266 0.0633
10.2266 0.0042 77 10.2188 0.0632
10.2422 0.0043 78 10.2188 0.0631
10.2031 0.0043 79 10.2109 0.0630
10.2031 0.0044 80 10.2109 0.0631
10.2188 0.0044 81 10.2031 0.0633
10.2188 0.0045 82 10.2031 0.0637
10.2344 0.0045 83 10.1953 0.0641
10.2188 0.0046 84 10.1953 0.0647
10.2031 0.0046 85 10.1875 0.0653
10.2266 0.0047 86 10.1875 0.0657
10.2109 0.0047 87 10.1797 0.0660
10.1641 0.0048 88 10.1797 0.0660
10.1953 0.0048 89 10.1719 0.0660
10.1875 0.0049 90 10.1719 0.0658
10.2031 0.0050 91 10.1641 0.0658
10.1719 0.0050 92 10.1641 0.0658
10.1953 0.0051 93 10.1562 0.0660
10.1641 0.0051 94 10.1562 0.0665
10.1797 0.0052 95 10.1484 0.0673
10.1797 0.0052 96 10.1484 0.0682
10.1406 0.0053 97 10.1406 0.0690
10.1562 0.0053 98 10.1406 0.0696
10.1406 0.0054 99 10.1328 0.0699
10.1641 0.0054 100 10.1328 0.0700
10.1797 0.0055 101 10.125 0.0699
10.1484 0.0056 102 10.125 0.0699
10.1406 0.0056 103 10.1172 0.0701
10.1328 0.0057 104 10.1172 0.0706
10.0938 0.0057 105 10.1094 0.0712
10.1016 0.0058 106 10.1094 0.0719
10.1016 0.0058 107 10.1016 0.0725
10.1094 0.0059 108 10.1016 0.0728
10.1016 0.0059 109 10.1016 0.0729
10.1016 0.0060 110 10.0938 0.0729
10.0781 0.0060 111 10.0938 0.0728
10.0938 0.0061 112 10.0859 0.0727
10.1172 0.0062 113 10.0859 0.0725
10.1016 0.0062 114 10.0781 0.0725
10.0938 0.0063 115 10.0781 0.0726
10.1016 0.0063 116 10.0703 0.0730
10.0703 0.0064 117 10.0703 0.0733
10.0938 0.0064 118 10.0625 0.0738
10.0859 0.0065 119 10.0625 0.0742
10.0781 0.0065 120 10.0625 0.0744
10.0625 0.0066 121 10.0547 0.0745
10.0547 0.0066 122 10.0547 0.0746
10.0781 0.0067 123 10.0469 0.0746
10.0625 0.0068 124 10.0469 0.0745
10.0781 0.0068 125 10.0391 0.0745
10.0781 0.0069 126 10.0391 0.0747
10.0703 0.0069 127 10.0391 0.0752
10.0547 0.0070 128 10.0312 0.0758
10.0469 0.0070 129 10.0312 0.0762
10.0391 0.0071 130 10.0234 0.0765
10.0391 0.0071 131 10.0234 0.0765
10.0469 0.0072 132 10.0156 0.0764
10.0469 0.0072 133 10.0156 0.0761
10.0234 0.0073 134 10.0156 0.0759
10.0312 0.0074 135 10.0078 0.0757
10.0312 0.0074 136 10.0078 0.0757
10.0078 0.0075 137 10.0 0.0759
10.0 0.0075 138 10.0 0.0763
10.0078 0.0076 139 10.0 0.0768
10.0234 0.0076 140 9.9922 0.0774
9.9922 0.0077 141 9.9922 0.0779
10.0234 0.0077 142 9.9844 0.0782
9.9766 0.0078 143 9.9844 0.0783
10.0156 0.0078 144 9.9844 0.0782
9.9844 0.0079 145 9.9766 0.0780
9.9922 0.0080 146 9.9766 0.0778
9.9844 0.0080 147 9.9688 0.0776
10.0 0.0081 148 9.9688 0.0775
9.9766 0.0081 149 9.9688 0.0776
9.9688 0.0082 150 9.9609 0.0778
9.9844 0.0082 151 9.9609 0.0782
9.9766 0.0083 152 9.9531 0.0785
9.9766 0.0083 153 9.9531 0.0787
9.9922 0.0084 154 9.9453 0.0787
9.9688 0.0084 155 9.9453 0.0787
9.9141 0.0085 156 9.9453 0.0785
9.9453 0.0086 157 9.9375 0.0783
9.9375 0.0086 158 9.9375 0.0782
9.9453 0.0087 159 9.9375 0.0782
9.9531 0.0087 160 9.9297 0.0784
9.9297 0.0088 161 9.9297 0.0788
9.9375 0.0088 162 9.9219 0.0793
9.9219 0.0089 163 9.9219 0.0797
9.9297 0.0089 164 9.9219 0.0799
9.9219 0.0090 165 9.9141 0.0802
9.9141 0.0090 166 9.9141 0.0801
9.9141 0.0091 167 9.9062 0.0799
9.9219 0.0092 168 9.9062 0.0797
9.9062 0.0092 169 9.9062 0.0795
9.9062 0.0093 170 9.8984 0.0795
9.9062 0.0093 171 9.8984 0.0797
9.9297 0.0094 172 9.8906 0.0800
9.8984 0.0094 173 9.8906 0.0804
9.875 0.0095 174 9.8906 0.0808
9.8984 0.0095 175 9.8828 0.0810
9.8828 0.0096 176 9.8828 0.0811
9.8828 0.0096 177 9.8828 0.0811
9.875 0.0097 178 9.875 0.0808
9.8828 0.0098 179 9.875 0.0805
9.8906 0.0098 180 9.8672 0.0803
9.8594 0.0099 181 9.8672 0.0803
9.8828 0.0099 182 9.8672 0.0804
9.8906 0.0100 183 9.8594 0.0807
9.8438 0.0100 184 9.8594 0.0809
9.8672 0.0101 185 9.8516 0.0810
9.8828 0.0101 186 9.8516 0.0811
9.8828 0.0102 187 9.8516 0.0811
9.8594 0.0102 188 9.8438 0.0811
9.8672 0.0103 189 9.8438 0.0811
9.8516 0.0104 190 9.8438 0.0812
9.8281 0.0104 191 9.8359 0.0813
9.8359 0.0105 192 9.8359 0.0816
9.8359 0.0105 193 9.8281 0.0818
9.8516 0.0106 194 9.8281 0.0819
9.8125 0.0106 195 9.8281 0.0817
9.8047 0.0107 196 9.8203 0.0815
9.8203 0.0107 197 9.8203 0.0814
9.8438 0.0108 198 9.8203 0.0814
9.8281 0.0108 199 9.8125 0.0815
9.8516 0.0109 200 9.8125 0.0819
9.8125 0.0110 201 9.8047 0.0823
9.7969 0.0110 202 9.8047 0.0826
9.8359 0.0111 203 9.8047 0.0827
9.8359 0.0111 204 9.7969 0.0828
9.8281 0.0112 205 9.7969 0.0826
9.8359 0.0112 206 9.7969 0.0824
9.8125 0.0113 207 9.7891 0.0823
9.8281 0.0113 208 9.7891 0.0824
9.8203 0.0114 209 9.7812 0.0826
9.7891 0.0114 210 9.7812 0.0826
9.7734 0.0115 211 9.7812 0.0826
9.7734 0.0116 212 9.7734 0.0830
9.7969 0.0116 213 9.7734 0.0835
9.7969 0.0117 214 9.7656 0.0840
9.7656 0.0117 215 9.7656 0.0844
9.7891 0.0118 216 9.7656 0.0844
9.7812 0.0118 217 9.7578 0.0845
9.7812 0.0119 218 9.7578 0.0844
9.7891 0.0119 219 9.7578 0.0844
9.7734 0.0120 220 9.75 0.0844
9.75 0.0120 221 9.75 0.0844
9.7578 0.0121 222 9.7422 0.0843
9.7422 0.0122 223 9.7422 0.0842
9.7578 0.0122 224 9.7422 0.0843
9.7344 0.0123 225 9.7344 0.0845
9.7578 0.0123 226 9.7344 0.0848
9.7734 0.0124 227 9.7344 0.0851
9.7266 0.0124 228 9.7266 0.0851
9.7344 0.0125 229 9.7266 0.0849
9.7344 0.0125 230 9.7266 0.0849
9.6875 0.0126 231 9.7188 0.0850
9.75 0.0126 232 9.7188 0.0854
9.7188 0.0127 233 9.7109 0.0857
9.7109 0.0128 234 9.7109 0.0860
9.7031 0.0128 235 9.7109 0.0861
9.7422 0.0129 236 9.7031 0.0861
9.7266 0.0129 237 9.7031 0.0861
9.7109 0.0130 238 9.7031 0.0858
9.7422 0.0130 239 9.6953 0.0856
9.6875 0.0131 240 9.6953 0.0854
9.7109 0.0131 241 9.6953 0.0853
9.6953 0.0132 242 9.6875 0.0853
9.7109 0.0132 243 9.6875 0.0856
9.6719 0.0133 244 9.6797 0.0859
9.7109 0.0134 245 9.6797 0.0863
9.6719 0.0134 246 9.6797 0.0866
9.7109 0.0135 247 9.6719 0.0867
9.7031 0.0135 248 9.6719 0.0866
9.6641 0.0136 249 9.6719 0.0866
9.6953 0.0136 250 9.6641 0.0866
9.6641 0.0137 251 9.6641 0.0866
9.6719 0.0137 252 9.6641 0.0868
9.6719 0.0138 253 9.6562 0.0869
9.6797 0.0138 254 9.6562 0.0870
9.6797 0.0139 255 9.6484 0.0870
9.6641 0.0139 256 9.6484 0.0870
9.6562 0.0140 257 9.6484 0.0869
9.6562 0.0141 258 9.6406 0.0867
9.6562 0.0141 259 9.6406 0.0865
9.6641 0.0142 260 9.6406 0.0866
9.6406 0.0142 261 9.6328 0.0868
9.6484 0.0143 262 9.6328 0.0871
9.6484 0.0143 263 9.6328 0.0873
9.6328 0.0144 264 9.625 0.0874
9.625 0.0144 265 9.625 0.0875
9.6328 0.0145 266 9.6172 0.0877
9.6641 0.0145 267 9.6172 0.0877
9.6484 0.0146 268 9.6172 0.0877
9.6328 0.0147 269 9.6094 0.0877
9.625 0.0147 270 9.6094 0.0875
9.625 0.0148 271 9.6094 0.0875
9.6094 0.0148 272 9.6016 0.0875
9.6172 0.0149 273 9.6016 0.0877
9.625 0.0149 274 9.6016 0.0878

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.1.0a0+32f93b1
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
25
Safetensors
Model size
35.5M params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for gokulsrinivasagan/gpt_train_12_256

Finetuned
(1135)
this model

Dataset used to train gokulsrinivasagan/gpt_train_12_256

Evaluation results

  • Accuracy on gokuls/wiki_book_corpus_raw_dataset_tiny
    self-reported
    0.088