babylm-subwords-text-gpt2_lm-model
This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 2.0621
- Model Preparation Time: 0.0025
- Perplexity: 7.8628
- Bpc: 2.9750
- Babyslm Test Syntactic: 0.8980
- Blimp Supplement: 0.6697
- Blimp Filtered: 0.7626
- Blimp Supplement Hypernym: 0.4869
- Blimp Supplement Qa Congruence Easy: 0.7344
- Blimp Supplement Qa Congruence Tricky: 0.5030
- Blimp Supplement Subject Aux Inversion: 0.8384
- Blimp Supplement Turn Taking: 0.7857
- Blimp Adjunct Island Filtered: 0.8093
- Blimp Anaphor Gender Agreement Filtered: 0.9557
- Blimp Anaphor Number Agreement Filtered: 0.9925
- Blimp Animate Subject Passive Filtered: 0.7520
- Blimp Animate Subject Trans Filtered: 0.8906
- Blimp Causative Filtered: 0.7763
- Blimp Complex Np Island Filtered: 0.4811
- Blimp Coordinate Structure Constraint Complex Left Branch Filtered: 0.5706
- Blimp Coordinate Structure Constraint Object Extraction Filtered: 0.7566
- Blimp Determiner Noun Agreement 1 Filtered: 0.9699
- Blimp Determiner Noun Agreement 2 Filtered: 0.9731
- Blimp Determiner Noun Agreement Irregular 1 Filtered: 0.8664
- Blimp Determiner Noun Agreement Irregular 2 Filtered: 0.9280
- Blimp Determiner Noun Agreement With Adj 2 Filtered: 0.9437
- Blimp Determiner Noun Agreement With Adj Irregular 1 Filtered: 0.8468
- Blimp Determiner Noun Agreement With Adj Irregular 2 Filtered: 0.8702
- Blimp Determiner Noun Agreement With Adjective 1 Filtered: 0.9475
- Blimp Distractor Agreement Relational Noun Filtered: 0.8731
- Blimp Distractor Agreement Relative Clause Filtered: 0.7348
- Blimp Drop Argument Filtered: 0.7370
- Blimp Ellipsis N Bar 1 Filtered: 0.7145
- Blimp Ellipsis N Bar 2 Filtered: 0.7911
- Blimp Existential There Object Raising Filtered: 0.7574
- Blimp Existential There Quantifiers 1 Filtered: 0.9774
- Blimp Existential There Quantifiers 2 Filtered: 0.4468
- Blimp Existential There Subject Raising Filtered: 0.8755
- Blimp Expletive It Object Raising Filtered: 0.7800
- Blimp Inchoative Filtered: 0.6094
- Blimp Intransitive Filtered: 0.7247
- Blimp Irregular Past Participle Adjectives Filtered: 0.9355
- Blimp Irregular Past Participle Verbs Filtered: 0.8142
- Blimp Irregular Plural Subject Verb Agreement 1 Filtered: 0.8831
- Blimp Irregular Plural Subject Verb Agreement 2 Filtered: 0.8756
- Blimp Left Branch Island Echo Question Filtered: 0.3506
- Blimp Left Branch Island Simple Question Filtered: 0.6572
- Blimp Matrix Question Npi Licensor Present Filtered: 0.5597
- Blimp Npi Present 1 Filtered: 0.5193
- Blimp Npi Present 2 Filtered: 0.6149
- Blimp Only Npi Licensor Present Filtered: 0.9433
- Blimp Only Npi Scope Filtered: 0.7957
- Blimp Passive 1 Filtered: 0.8952
- Blimp Passive 2 Filtered: 0.8627
- Blimp Principle A C Command Filtered: 0.6332
- Blimp Principle A Case 1 Filtered: 1.0
- Blimp Principle A Case 2 Filtered: 0.9290
- Blimp Principle A Domain 1 Filtered: 0.9847
- Blimp Principle A Domain 2 Filtered: 0.6623
- Blimp Principle A Domain 3 Filtered: 0.6259
- Blimp Principle A Reconstruction Filtered: 0.1892
- Blimp Regular Plural Subject Verb Agreement 1 Filtered: 0.8933
- Blimp Regular Plural Subject Verb Agreement 2 Filtered: 0.8349
- Blimp Sentential Negation Npi Licensor Present Filtered: 0.9706
- Blimp Sentential Negation Npi Scope Filtered: 0.4363
- Blimp Sentential Subject Island Filtered: 0.3569
- Blimp Superlative Quantifiers 1 Filtered: 0.7783
- Blimp Superlative Quantifiers 2 Filtered: 0.8063
- Blimp Tough Vs Raising 1 Filtered: 0.4568
- Blimp Tough Vs Raising 2 Filtered: 0.8630
- Blimp Transitive Filtered: 0.8410
- Blimp Wh Island Filtered: 0.7146
- Blimp Wh Questions Object Gap Filtered: 0.7520
- Blimp Wh Questions Subject Gap Filtered: 0.9521
- Blimp Wh Questions Subject Gap Long Distance Filtered: 0.9113
- Blimp Wh Vs That No Gap Filtered: 0.9826
- Blimp Wh Vs That No Gap Long Distance Filtered: 0.9829
- Blimp Wh Vs That With Gap Filtered: 0.3852
- Blimp Wh Vs That With Gap Long Distance Filtered: 0.0912
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 90000
- training_steps: 400000
Training results
Training Loss | Epoch | Step | Validation Loss | Model Preparation Time | Perplexity | Bpc | Babyslm Test Syntactic | Blimp Supplement | Blimp Filtered | Blimp Supplement Hypernym | Blimp Supplement Qa Congruence Easy | Blimp Supplement Qa Congruence Tricky | Blimp Supplement Subject Aux Inversion | Blimp Supplement Turn Taking | Blimp Adjunct Island Filtered | Blimp Anaphor Gender Agreement Filtered | Blimp Anaphor Number Agreement Filtered | Blimp Animate Subject Passive Filtered | Blimp Animate Subject Trans Filtered | Blimp Causative Filtered | Blimp Complex Np Island Filtered | Blimp Coordinate Structure Constraint Complex Left Branch Filtered | Blimp Coordinate Structure Constraint Object Extraction Filtered | Blimp Determiner Noun Agreement 1 Filtered | Blimp Determiner Noun Agreement 2 Filtered | Blimp Determiner Noun Agreement Irregular 1 Filtered | Blimp Determiner Noun Agreement Irregular 2 Filtered | Blimp Determiner Noun Agreement With Adj 2 Filtered | Blimp Determiner Noun Agreement With Adj Irregular 1 Filtered | Blimp Determiner Noun Agreement With Adj Irregular 2 Filtered | Blimp Determiner Noun Agreement With Adjective 1 Filtered | Blimp Distractor Agreement Relational Noun Filtered | Blimp Distractor Agreement Relative Clause Filtered | Blimp Drop Argument Filtered | Blimp Ellipsis N Bar 1 Filtered | Blimp Ellipsis N Bar 2 Filtered | Blimp Existential There Object Raising Filtered | Blimp Existential There Quantifiers 1 Filtered | Blimp Existential There Quantifiers 2 Filtered | Blimp Existential There Subject Raising Filtered | Blimp Expletive It Object Raising Filtered | Blimp Inchoative Filtered | Blimp Intransitive Filtered | Blimp Irregular Past Participle Adjectives Filtered | Blimp Irregular Past Participle Verbs Filtered | Blimp Irregular Plural Subject Verb Agreement 1 Filtered | Blimp Irregular Plural Subject Verb Agreement 2 Filtered | Blimp Left Branch Island Echo Question Filtered | Blimp Left Branch Island Simple Question Filtered | Blimp Matrix Question Npi Licensor Present Filtered | Blimp Npi Present 1 Filtered | Blimp Npi Present 2 Filtered | Blimp Only Npi Licensor Present Filtered | Blimp Only Npi Scope Filtered | Blimp Passive 1 Filtered | Blimp Passive 2 Filtered | Blimp Principle A C Command Filtered | Blimp Principle A Case 1 Filtered | Blimp Principle A Case 2 Filtered | Blimp Principle A Domain 1 Filtered | Blimp Principle A Domain 2 Filtered | Blimp Principle A Domain 3 Filtered | Blimp Principle A Reconstruction Filtered | Blimp Regular Plural Subject Verb Agreement 1 Filtered | Blimp Regular Plural Subject Verb Agreement 2 Filtered | Blimp Sentential Negation Npi Licensor Present Filtered | Blimp Sentential Negation Npi Scope Filtered | Blimp Sentential Subject Island Filtered | Blimp Superlative Quantifiers 1 Filtered | Blimp Superlative Quantifiers 2 Filtered | Blimp Tough Vs Raising 1 Filtered | Blimp Tough Vs Raising 2 Filtered | Blimp Transitive Filtered | Blimp Wh Island Filtered | Blimp Wh Questions Object Gap Filtered | Blimp Wh Questions Subject Gap Filtered | Blimp Wh Questions Subject Gap Long Distance Filtered | Blimp Wh Vs That No Gap Filtered | Blimp Wh Vs That No Gap Long Distance Filtered | Blimp Wh Vs That With Gap Filtered | Blimp Wh Vs That With Gap Long Distance Filtered |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2.1359 | 0.8323 | 50000 | 2.3362 | 0.0025 | 10.3414 | 3.3704 | 0.8505 | 0.6374 | 0.6767 | 0.5131 | 0.6094 | 0.5030 | 0.8580 | 0.7036 | 0.7209 | 0.6910 | 0.8969 | 0.6436 | 0.8180 | 0.6711 | 0.4350 | 0.2528 | 0.6322 | 0.9354 | 0.9517 | 0.7195 | 0.8622 | 0.9107 | 0.7855 | 0.8131 | 0.9025 | 0.4860 | 0.5144 | 0.7228 | 0.5935 | 0.7379 | 0.7808 | 0.9935 | 0.2481 | 0.7727 | 0.7536 | 0.4667 | 0.5933 | 0.9043 | 0.8206 | 0.7624 | 0.8217 | 0.2228 | 0.3922 | 0.2131 | 0.3311 | 0.4136 | 0.8503 | 0.8423 | 0.8036 | 0.7663 | 0.5899 | 1.0 | 0.8973 | 0.9880 | 0.5311 | 0.5590 | 0.3557 | 0.8326 | 0.7376 | 0.9510 | 0.3846 | 0.4849 | 0.7324 | 0.8093 | 0.2584 | 0.8522 | 0.7615 | 0.4396 | 0.5972 | 0.8953 | 0.9498 | 0.8769 | 0.952 | 0.3515 | 0.1044 |
2.0144 | 1.6645 | 100000 | 2.2289 | 0.0025 | 9.2894 | 3.2156 | 0.8839 | 0.6279 | 0.7278 | 0.4929 | 0.6406 | 0.4606 | 0.8381 | 0.7071 | 0.6875 | 0.8661 | 0.9560 | 0.6894 | 0.8537 | 0.6870 | 0.4917 | 0.4117 | 0.6554 | 0.9709 | 0.9656 | 0.7739 | 0.8902 | 0.9299 | 0.7813 | 0.8286 | 0.9357 | 0.7602 | 0.6533 | 0.7413 | 0.6421 | 0.7271 | 0.8140 | 0.9817 | 0.4105 | 0.8323 | 0.7642 | 0.5754 | 0.6740 | 0.9553 | 0.7665 | 0.8545 | 0.8688 | 0.2798 | 0.5668 | 0.3434 | 0.4323 | 0.6007 | 0.9977 | 0.8244 | 0.8571 | 0.7730 | 0.6163 | 1.0 | 0.9235 | 0.9836 | 0.5891 | 0.5632 | 0.2844 | 0.8820 | 0.8169 | 0.9978 | 0.4719 | 0.4693 | 0.9081 | 0.8813 | 0.2964 | 0.8217 | 0.7546 | 0.5490 | 0.6100 | 0.9154 | 0.9557 | 0.9582 | 0.9817 | 0.3482 | 0.1121 |
1.9281 | 2.4968 | 150000 | 2.1594 | 0.0025 | 8.6658 | 3.1153 | 0.8965 | 0.6471 | 0.7344 | 0.5012 | 0.6406 | 0.5273 | 0.8270 | 0.7393 | 0.7295 | 0.9073 | 0.9796 | 0.7017 | 0.8852 | 0.7494 | 0.4480 | 0.4558 | 0.7439 | 0.9580 | 0.9710 | 0.8120 | 0.9049 | 0.9426 | 0.8370 | 0.8464 | 0.9421 | 0.7944 | 0.7038 | 0.7663 | 0.6783 | 0.7428 | 0.7426 | 0.9763 | 0.4863 | 0.8193 | 0.7391 | 0.5661 | 0.6959 | 0.8866 | 0.8089 | 0.8420 | 0.8845 | 0.2482 | 0.6278 | 0.5393 | 0.4554 | 0.5799 | 0.9252 | 0.6762 | 0.8905 | 0.8328 | 0.5825 | 1.0 | 0.9355 | 0.9836 | 0.5421 | 0.5537 | 0.2254 | 0.9045 | 0.8360 | 0.9902 | 0.3054 | 0.3861 | 0.9040 | 0.8590 | 0.3576 | 0.8565 | 0.7995 | 0.6021 | 0.6892 | 0.9131 | 0.9172 | 0.9768 | 0.9863 | 0.2949 | 0.0813 |
1.8715 | 3.3290 | 200000 | 2.1227 | 0.0025 | 8.3534 | 3.0624 | 0.8968 | 0.6640 | 0.7515 | 0.5 | 0.7031 | 0.4970 | 0.8521 | 0.7679 | 0.7694 | 0.9197 | 0.9828 | 0.7106 | 0.8754 | 0.7445 | 0.4456 | 0.5552 | 0.6839 | 0.9677 | 0.9731 | 0.7885 | 0.9037 | 0.9309 | 0.8398 | 0.8512 | 0.9357 | 0.8490 | 0.7313 | 0.7435 | 0.6895 | 0.7681 | 0.7968 | 0.9645 | 0.5049 | 0.8463 | 0.7655 | 0.5942 | 0.6717 | 0.8803 | 0.8524 | 0.8831 | 0.8767 | 0.3210 | 0.6562 | 0.4295 | 0.5336 | 0.6346 | 0.9422 | 0.8220 | 0.8964 | 0.8084 | 0.5719 | 1.0 | 0.9530 | 0.9869 | 0.6055 | 0.6026 | 0.2161 | 0.9247 | 0.8328 | 0.9891 | 0.4064 | 0.3985 | 0.9009 | 0.8306 | 0.4504 | 0.8435 | 0.8157 | 0.6156 | 0.7451 | 0.9376 | 0.9253 | 0.9756 | 0.9726 | 0.3852 | 0.1275 |
1.8345 | 4.1613 | 250000 | 2.0978 | 0.0025 | 8.1484 | 3.0265 | 0.9 | 0.6548 | 0.7531 | 0.4656 | 0.6719 | 0.5030 | 0.8585 | 0.775 | 0.8157 | 0.9598 | 0.9914 | 0.7173 | 0.8657 | 0.7347 | 0.4681 | 0.4812 | 0.7313 | 0.9699 | 0.9635 | 0.8267 | 0.8939 | 0.9458 | 0.8482 | 0.8560 | 0.9528 | 0.8541 | 0.7497 | 0.7587 | 0.7145 | 0.7959 | 0.7648 | 0.9731 | 0.4171 | 0.8701 | 0.7681 | 0.6140 | 0.7293 | 0.9188 | 0.7909 | 0.8719 | 0.8733 | 0.3506 | 0.5910 | 0.6168 | 0.5424 | 0.6324 | 0.9921 | 0.7539 | 0.8940 | 0.8427 | 0.6068 | 1.0 | 0.9399 | 0.9770 | 0.5716 | 0.5919 | 0.2110 | 0.9079 | 0.8455 | 0.9717 | 0.3823 | 0.4100 | 0.7252 | 0.8195 | 0.3713 | 0.8815 | 0.8145 | 0.7198 | 0.7404 | 0.9477 | 0.9335 | 0.9779 | 0.9897 | 0.3656 | 0.0549 |
1.8001 | 4.9935 | 300000 | 2.0813 | 0.0025 | 8.0149 | 3.0027 | 0.8936 | 0.6503 | 0.7545 | 0.4976 | 0.6875 | 0.4667 | 0.8353 | 0.7643 | 0.7457 | 0.9269 | 0.9936 | 0.7464 | 0.8841 | 0.7543 | 0.4787 | 0.5033 | 0.7471 | 0.9623 | 0.9667 | 0.8590 | 0.9354 | 0.9501 | 0.8315 | 0.8738 | 0.9453 | 0.8261 | 0.7187 | 0.7652 | 0.7219 | 0.7899 | 0.7796 | 0.9763 | 0.3359 | 0.8636 | 0.7734 | 0.5860 | 0.7316 | 0.9272 | 0.8142 | 0.8595 | 0.8845 | 0.3157 | 0.6288 | 0.4801 | 0.5050 | 0.6258 | 0.9649 | 0.8363 | 0.8976 | 0.8527 | 0.6342 | 1.0 | 0.9213 | 0.9945 | 0.6601 | 0.5994 | 0.2296 | 0.9124 | 0.8190 | 0.9804 | 0.4363 | 0.3913 | 0.7211 | 0.7890 | 0.4536 | 0.8522 | 0.8410 | 0.7271 | 0.7276 | 0.9399 | 0.9148 | 0.9779 | 0.9863 | 0.3798 | 0.0967 |
1.7665 | 5.8258 | 350000 | 2.0700 | 0.0025 | 7.9247 | 2.9864 | 0.9051 | 0.6706 | 0.7619 | 0.4988 | 0.7344 | 0.5091 | 0.8285 | 0.7821 | 0.8039 | 0.9351 | 0.9903 | 0.7475 | 0.8862 | 0.7555 | 0.4693 | 0.5717 | 0.7155 | 0.9688 | 0.9721 | 0.8634 | 0.9329 | 0.9437 | 0.8482 | 0.8714 | 0.9528 | 0.8744 | 0.7199 | 0.7293 | 0.6945 | 0.7862 | 0.7672 | 0.9731 | 0.4874 | 0.8647 | 0.7721 | 0.6129 | 0.7108 | 0.9116 | 0.8280 | 0.8794 | 0.8890 | 0.3326 | 0.6761 | 0.5210 | 0.5237 | 0.6269 | 0.9524 | 0.8184 | 0.8952 | 0.8583 | 0.5941 | 1.0 | 0.9301 | 0.9825 | 0.6820 | 0.6472 | 0.1996 | 0.9124 | 0.8370 | 0.9826 | 0.4650 | 0.3777 | 0.7273 | 0.8256 | 0.4568 | 0.8489 | 0.8376 | 0.7271 | 0.7730 | 0.9555 | 0.9417 | 0.9768 | 0.9874 | 0.3721 | 0.0769 |
1.7419 | 6.6580 | 400000 | 2.0621 | 0.0025 | 7.8628 | 2.9750 | 0.8980 | 0.6697 | 0.7626 | 0.4869 | 0.7344 | 0.5030 | 0.8384 | 0.7857 | 0.8093 | 0.9557 | 0.9925 | 0.7520 | 0.8906 | 0.7763 | 0.4811 | 0.5706 | 0.7566 | 0.9699 | 0.9731 | 0.8664 | 0.9280 | 0.9437 | 0.8468 | 0.8702 | 0.9475 | 0.8731 | 0.7348 | 0.7370 | 0.7145 | 0.7911 | 0.7574 | 0.9774 | 0.4468 | 0.8755 | 0.7800 | 0.6094 | 0.7247 | 0.9355 | 0.8142 | 0.8831 | 0.8756 | 0.3506 | 0.6572 | 0.5597 | 0.5193 | 0.6149 | 0.9433 | 0.7957 | 0.8952 | 0.8627 | 0.6332 | 1.0 | 0.9290 | 0.9847 | 0.6623 | 0.6259 | 0.1892 | 0.8933 | 0.8349 | 0.9706 | 0.4363 | 0.3569 | 0.7783 | 0.8063 | 0.4568 | 0.8630 | 0.8410 | 0.7146 | 0.7520 | 0.9521 | 0.9113 | 0.9826 | 0.9829 | 0.3852 | 0.0912 |
Framework versions
- Transformers 4.44.2
- Pytorch 2.4.0+cu118
- Datasets 2.18.0
- Tokenizers 0.19.1
- Downloads last month
- 0
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.