babylm-subwords-text-gpt2_lm-model

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.0621
  • Model Preparation Time: 0.0025
  • Perplexity: 7.8628
  • Bpc: 2.9750
  • Babyslm Test Syntactic: 0.8980
  • Blimp Supplement: 0.6697
  • Blimp Filtered: 0.7626
  • Blimp Supplement Hypernym: 0.4869
  • Blimp Supplement Qa Congruence Easy: 0.7344
  • Blimp Supplement Qa Congruence Tricky: 0.5030
  • Blimp Supplement Subject Aux Inversion: 0.8384
  • Blimp Supplement Turn Taking: 0.7857
  • Blimp Adjunct Island Filtered: 0.8093
  • Blimp Anaphor Gender Agreement Filtered: 0.9557
  • Blimp Anaphor Number Agreement Filtered: 0.9925
  • Blimp Animate Subject Passive Filtered: 0.7520
  • Blimp Animate Subject Trans Filtered: 0.8906
  • Blimp Causative Filtered: 0.7763
  • Blimp Complex Np Island Filtered: 0.4811
  • Blimp Coordinate Structure Constraint Complex Left Branch Filtered: 0.5706
  • Blimp Coordinate Structure Constraint Object Extraction Filtered: 0.7566
  • Blimp Determiner Noun Agreement 1 Filtered: 0.9699
  • Blimp Determiner Noun Agreement 2 Filtered: 0.9731
  • Blimp Determiner Noun Agreement Irregular 1 Filtered: 0.8664
  • Blimp Determiner Noun Agreement Irregular 2 Filtered: 0.9280
  • Blimp Determiner Noun Agreement With Adj 2 Filtered: 0.9437
  • Blimp Determiner Noun Agreement With Adj Irregular 1 Filtered: 0.8468
  • Blimp Determiner Noun Agreement With Adj Irregular 2 Filtered: 0.8702
  • Blimp Determiner Noun Agreement With Adjective 1 Filtered: 0.9475
  • Blimp Distractor Agreement Relational Noun Filtered: 0.8731
  • Blimp Distractor Agreement Relative Clause Filtered: 0.7348
  • Blimp Drop Argument Filtered: 0.7370
  • Blimp Ellipsis N Bar 1 Filtered: 0.7145
  • Blimp Ellipsis N Bar 2 Filtered: 0.7911
  • Blimp Existential There Object Raising Filtered: 0.7574
  • Blimp Existential There Quantifiers 1 Filtered: 0.9774
  • Blimp Existential There Quantifiers 2 Filtered: 0.4468
  • Blimp Existential There Subject Raising Filtered: 0.8755
  • Blimp Expletive It Object Raising Filtered: 0.7800
  • Blimp Inchoative Filtered: 0.6094
  • Blimp Intransitive Filtered: 0.7247
  • Blimp Irregular Past Participle Adjectives Filtered: 0.9355
  • Blimp Irregular Past Participle Verbs Filtered: 0.8142
  • Blimp Irregular Plural Subject Verb Agreement 1 Filtered: 0.8831
  • Blimp Irregular Plural Subject Verb Agreement 2 Filtered: 0.8756
  • Blimp Left Branch Island Echo Question Filtered: 0.3506
  • Blimp Left Branch Island Simple Question Filtered: 0.6572
  • Blimp Matrix Question Npi Licensor Present Filtered: 0.5597
  • Blimp Npi Present 1 Filtered: 0.5193
  • Blimp Npi Present 2 Filtered: 0.6149
  • Blimp Only Npi Licensor Present Filtered: 0.9433
  • Blimp Only Npi Scope Filtered: 0.7957
  • Blimp Passive 1 Filtered: 0.8952
  • Blimp Passive 2 Filtered: 0.8627
  • Blimp Principle A C Command Filtered: 0.6332
  • Blimp Principle A Case 1 Filtered: 1.0
  • Blimp Principle A Case 2 Filtered: 0.9290
  • Blimp Principle A Domain 1 Filtered: 0.9847
  • Blimp Principle A Domain 2 Filtered: 0.6623
  • Blimp Principle A Domain 3 Filtered: 0.6259
  • Blimp Principle A Reconstruction Filtered: 0.1892
  • Blimp Regular Plural Subject Verb Agreement 1 Filtered: 0.8933
  • Blimp Regular Plural Subject Verb Agreement 2 Filtered: 0.8349
  • Blimp Sentential Negation Npi Licensor Present Filtered: 0.9706
  • Blimp Sentential Negation Npi Scope Filtered: 0.4363
  • Blimp Sentential Subject Island Filtered: 0.3569
  • Blimp Superlative Quantifiers 1 Filtered: 0.7783
  • Blimp Superlative Quantifiers 2 Filtered: 0.8063
  • Blimp Tough Vs Raising 1 Filtered: 0.4568
  • Blimp Tough Vs Raising 2 Filtered: 0.8630
  • Blimp Transitive Filtered: 0.8410
  • Blimp Wh Island Filtered: 0.7146
  • Blimp Wh Questions Object Gap Filtered: 0.7520
  • Blimp Wh Questions Subject Gap Filtered: 0.9521
  • Blimp Wh Questions Subject Gap Long Distance Filtered: 0.9113
  • Blimp Wh Vs That No Gap Filtered: 0.9826
  • Blimp Wh Vs That No Gap Long Distance Filtered: 0.9829
  • Blimp Wh Vs That With Gap Filtered: 0.3852
  • Blimp Wh Vs That With Gap Long Distance Filtered: 0.0912

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 90000
  • training_steps: 400000

Training results

Training Loss Epoch Step Validation Loss Model Preparation Time Perplexity Bpc Babyslm Test Syntactic Blimp Supplement Blimp Filtered Blimp Supplement Hypernym Blimp Supplement Qa Congruence Easy Blimp Supplement Qa Congruence Tricky Blimp Supplement Subject Aux Inversion Blimp Supplement Turn Taking Blimp Adjunct Island Filtered Blimp Anaphor Gender Agreement Filtered Blimp Anaphor Number Agreement Filtered Blimp Animate Subject Passive Filtered Blimp Animate Subject Trans Filtered Blimp Causative Filtered Blimp Complex Np Island Filtered Blimp Coordinate Structure Constraint Complex Left Branch Filtered Blimp Coordinate Structure Constraint Object Extraction Filtered Blimp Determiner Noun Agreement 1 Filtered Blimp Determiner Noun Agreement 2 Filtered Blimp Determiner Noun Agreement Irregular 1 Filtered Blimp Determiner Noun Agreement Irregular 2 Filtered Blimp Determiner Noun Agreement With Adj 2 Filtered Blimp Determiner Noun Agreement With Adj Irregular 1 Filtered Blimp Determiner Noun Agreement With Adj Irregular 2 Filtered Blimp Determiner Noun Agreement With Adjective 1 Filtered Blimp Distractor Agreement Relational Noun Filtered Blimp Distractor Agreement Relative Clause Filtered Blimp Drop Argument Filtered Blimp Ellipsis N Bar 1 Filtered Blimp Ellipsis N Bar 2 Filtered Blimp Existential There Object Raising Filtered Blimp Existential There Quantifiers 1 Filtered Blimp Existential There Quantifiers 2 Filtered Blimp Existential There Subject Raising Filtered Blimp Expletive It Object Raising Filtered Blimp Inchoative Filtered Blimp Intransitive Filtered Blimp Irregular Past Participle Adjectives Filtered Blimp Irregular Past Participle Verbs Filtered Blimp Irregular Plural Subject Verb Agreement 1 Filtered Blimp Irregular Plural Subject Verb Agreement 2 Filtered Blimp Left Branch Island Echo Question Filtered Blimp Left Branch Island Simple Question Filtered Blimp Matrix Question Npi Licensor Present Filtered Blimp Npi Present 1 Filtered Blimp Npi Present 2 Filtered Blimp Only Npi Licensor Present Filtered Blimp Only Npi Scope Filtered Blimp Passive 1 Filtered Blimp Passive 2 Filtered Blimp Principle A C Command Filtered Blimp Principle A Case 1 Filtered Blimp Principle A Case 2 Filtered Blimp Principle A Domain 1 Filtered Blimp Principle A Domain 2 Filtered Blimp Principle A Domain 3 Filtered Blimp Principle A Reconstruction Filtered Blimp Regular Plural Subject Verb Agreement 1 Filtered Blimp Regular Plural Subject Verb Agreement 2 Filtered Blimp Sentential Negation Npi Licensor Present Filtered Blimp Sentential Negation Npi Scope Filtered Blimp Sentential Subject Island Filtered Blimp Superlative Quantifiers 1 Filtered Blimp Superlative Quantifiers 2 Filtered Blimp Tough Vs Raising 1 Filtered Blimp Tough Vs Raising 2 Filtered Blimp Transitive Filtered Blimp Wh Island Filtered Blimp Wh Questions Object Gap Filtered Blimp Wh Questions Subject Gap Filtered Blimp Wh Questions Subject Gap Long Distance Filtered Blimp Wh Vs That No Gap Filtered Blimp Wh Vs That No Gap Long Distance Filtered Blimp Wh Vs That With Gap Filtered Blimp Wh Vs That With Gap Long Distance Filtered
2.1359 0.8323 50000 2.3362 0.0025 10.3414 3.3704 0.8505 0.6374 0.6767 0.5131 0.6094 0.5030 0.8580 0.7036 0.7209 0.6910 0.8969 0.6436 0.8180 0.6711 0.4350 0.2528 0.6322 0.9354 0.9517 0.7195 0.8622 0.9107 0.7855 0.8131 0.9025 0.4860 0.5144 0.7228 0.5935 0.7379 0.7808 0.9935 0.2481 0.7727 0.7536 0.4667 0.5933 0.9043 0.8206 0.7624 0.8217 0.2228 0.3922 0.2131 0.3311 0.4136 0.8503 0.8423 0.8036 0.7663 0.5899 1.0 0.8973 0.9880 0.5311 0.5590 0.3557 0.8326 0.7376 0.9510 0.3846 0.4849 0.7324 0.8093 0.2584 0.8522 0.7615 0.4396 0.5972 0.8953 0.9498 0.8769 0.952 0.3515 0.1044
2.0144 1.6645 100000 2.2289 0.0025 9.2894 3.2156 0.8839 0.6279 0.7278 0.4929 0.6406 0.4606 0.8381 0.7071 0.6875 0.8661 0.9560 0.6894 0.8537 0.6870 0.4917 0.4117 0.6554 0.9709 0.9656 0.7739 0.8902 0.9299 0.7813 0.8286 0.9357 0.7602 0.6533 0.7413 0.6421 0.7271 0.8140 0.9817 0.4105 0.8323 0.7642 0.5754 0.6740 0.9553 0.7665 0.8545 0.8688 0.2798 0.5668 0.3434 0.4323 0.6007 0.9977 0.8244 0.8571 0.7730 0.6163 1.0 0.9235 0.9836 0.5891 0.5632 0.2844 0.8820 0.8169 0.9978 0.4719 0.4693 0.9081 0.8813 0.2964 0.8217 0.7546 0.5490 0.6100 0.9154 0.9557 0.9582 0.9817 0.3482 0.1121
1.9281 2.4968 150000 2.1594 0.0025 8.6658 3.1153 0.8965 0.6471 0.7344 0.5012 0.6406 0.5273 0.8270 0.7393 0.7295 0.9073 0.9796 0.7017 0.8852 0.7494 0.4480 0.4558 0.7439 0.9580 0.9710 0.8120 0.9049 0.9426 0.8370 0.8464 0.9421 0.7944 0.7038 0.7663 0.6783 0.7428 0.7426 0.9763 0.4863 0.8193 0.7391 0.5661 0.6959 0.8866 0.8089 0.8420 0.8845 0.2482 0.6278 0.5393 0.4554 0.5799 0.9252 0.6762 0.8905 0.8328 0.5825 1.0 0.9355 0.9836 0.5421 0.5537 0.2254 0.9045 0.8360 0.9902 0.3054 0.3861 0.9040 0.8590 0.3576 0.8565 0.7995 0.6021 0.6892 0.9131 0.9172 0.9768 0.9863 0.2949 0.0813
1.8715 3.3290 200000 2.1227 0.0025 8.3534 3.0624 0.8968 0.6640 0.7515 0.5 0.7031 0.4970 0.8521 0.7679 0.7694 0.9197 0.9828 0.7106 0.8754 0.7445 0.4456 0.5552 0.6839 0.9677 0.9731 0.7885 0.9037 0.9309 0.8398 0.8512 0.9357 0.8490 0.7313 0.7435 0.6895 0.7681 0.7968 0.9645 0.5049 0.8463 0.7655 0.5942 0.6717 0.8803 0.8524 0.8831 0.8767 0.3210 0.6562 0.4295 0.5336 0.6346 0.9422 0.8220 0.8964 0.8084 0.5719 1.0 0.9530 0.9869 0.6055 0.6026 0.2161 0.9247 0.8328 0.9891 0.4064 0.3985 0.9009 0.8306 0.4504 0.8435 0.8157 0.6156 0.7451 0.9376 0.9253 0.9756 0.9726 0.3852 0.1275
1.8345 4.1613 250000 2.0978 0.0025 8.1484 3.0265 0.9 0.6548 0.7531 0.4656 0.6719 0.5030 0.8585 0.775 0.8157 0.9598 0.9914 0.7173 0.8657 0.7347 0.4681 0.4812 0.7313 0.9699 0.9635 0.8267 0.8939 0.9458 0.8482 0.8560 0.9528 0.8541 0.7497 0.7587 0.7145 0.7959 0.7648 0.9731 0.4171 0.8701 0.7681 0.6140 0.7293 0.9188 0.7909 0.8719 0.8733 0.3506 0.5910 0.6168 0.5424 0.6324 0.9921 0.7539 0.8940 0.8427 0.6068 1.0 0.9399 0.9770 0.5716 0.5919 0.2110 0.9079 0.8455 0.9717 0.3823 0.4100 0.7252 0.8195 0.3713 0.8815 0.8145 0.7198 0.7404 0.9477 0.9335 0.9779 0.9897 0.3656 0.0549
1.8001 4.9935 300000 2.0813 0.0025 8.0149 3.0027 0.8936 0.6503 0.7545 0.4976 0.6875 0.4667 0.8353 0.7643 0.7457 0.9269 0.9936 0.7464 0.8841 0.7543 0.4787 0.5033 0.7471 0.9623 0.9667 0.8590 0.9354 0.9501 0.8315 0.8738 0.9453 0.8261 0.7187 0.7652 0.7219 0.7899 0.7796 0.9763 0.3359 0.8636 0.7734 0.5860 0.7316 0.9272 0.8142 0.8595 0.8845 0.3157 0.6288 0.4801 0.5050 0.6258 0.9649 0.8363 0.8976 0.8527 0.6342 1.0 0.9213 0.9945 0.6601 0.5994 0.2296 0.9124 0.8190 0.9804 0.4363 0.3913 0.7211 0.7890 0.4536 0.8522 0.8410 0.7271 0.7276 0.9399 0.9148 0.9779 0.9863 0.3798 0.0967
1.7665 5.8258 350000 2.0700 0.0025 7.9247 2.9864 0.9051 0.6706 0.7619 0.4988 0.7344 0.5091 0.8285 0.7821 0.8039 0.9351 0.9903 0.7475 0.8862 0.7555 0.4693 0.5717 0.7155 0.9688 0.9721 0.8634 0.9329 0.9437 0.8482 0.8714 0.9528 0.8744 0.7199 0.7293 0.6945 0.7862 0.7672 0.9731 0.4874 0.8647 0.7721 0.6129 0.7108 0.9116 0.8280 0.8794 0.8890 0.3326 0.6761 0.5210 0.5237 0.6269 0.9524 0.8184 0.8952 0.8583 0.5941 1.0 0.9301 0.9825 0.6820 0.6472 0.1996 0.9124 0.8370 0.9826 0.4650 0.3777 0.7273 0.8256 0.4568 0.8489 0.8376 0.7271 0.7730 0.9555 0.9417 0.9768 0.9874 0.3721 0.0769
1.7419 6.6580 400000 2.0621 0.0025 7.8628 2.9750 0.8980 0.6697 0.7626 0.4869 0.7344 0.5030 0.8384 0.7857 0.8093 0.9557 0.9925 0.7520 0.8906 0.7763 0.4811 0.5706 0.7566 0.9699 0.9731 0.8664 0.9280 0.9437 0.8468 0.8702 0.9475 0.8731 0.7348 0.7370 0.7145 0.7911 0.7574 0.9774 0.4468 0.8755 0.7800 0.6094 0.7247 0.9355 0.8142 0.8831 0.8756 0.3506 0.6572 0.5597 0.5193 0.6149 0.9433 0.7957 0.8952 0.8627 0.6332 1.0 0.9290 0.9847 0.6623 0.6259 0.1892 0.8933 0.8349 0.9706 0.4363 0.3569 0.7783 0.8063 0.4568 0.8630 0.8410 0.7146 0.7520 0.9521 0.9113 0.9826 0.9829 0.3852 0.0912

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.4.0+cu118
  • Datasets 2.18.0
  • Tokenizers 0.19.1
Downloads last month
0
Safetensors
Model size
97.5M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.