results / README.md

update model card README.md

949c25d about 1 year ago

6.46 kB

	---
	license: mit
	base_model: gpt2-medium
	tags:
	- generated_from_trainer
	metrics:
	- accuracy
	model-index:
	- name: results
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# results

	This model is a fine-tuned version of [gpt2-medium](https://huggingface.co/gpt2-medium) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.5570
	- Accuracy: 0.7508

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 3

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------:\|
	\| 0.6473 \| 0.04 \| 50 \| 0.5683 \| 0.7454 \|
	\| 0.6367 \| 0.07 \| 100 \| 0.5670 \| 0.7525 \|
	\| 0.6016 \| 0.11 \| 150 \| 0.5676 \| 0.7508 \|
	\| 0.6014 \| 0.14 \| 200 \| 0.5498 \| 0.75 \|
	\| 0.5801 \| 0.18 \| 250 \| 0.5446 \| 0.75 \|
	\| 0.4534 \| 0.21 \| 300 \| 0.5383 \| 0.7512 \|
	\| 0.669 \| 0.25 \| 350 \| 0.5700 \| 0.75 \|
	\| 0.5556 \| 0.29 \| 400 \| 0.5536 \| 0.7496 \|
	\| 0.5652 \| 0.32 \| 450 \| 0.6341 \| 0.75 \|
	\| 0.5801 \| 0.36 \| 500 \| 0.5416 \| 0.7454 \|
	\| 0.6476 \| 0.39 \| 550 \| 0.5319 \| 0.7508 \|
	\| 0.5473 \| 0.43 \| 600 \| 0.5422 \| 0.7492 \|
	\| 0.5094 \| 0.46 \| 650 \| 0.5532 \| 0.7504 \|
	\| 0.5656 \| 0.5 \| 700 \| 0.5375 \| 0.7504 \|
	\| 0.532 \| 0.54 \| 750 \| 0.5617 \| 0.7137 \|
	\| 0.5738 \| 0.57 \| 800 \| 0.5501 \| 0.7521 \|
	\| 0.544 \| 0.61 \| 850 \| 0.5449 \| 0.7538 \|
	\| 0.5271 \| 0.64 \| 900 \| 0.5682 \| 0.7496 \|
	\| 0.9725 \| 0.68 \| 950 \| 0.7980 \| 0.4921 \|
	\| 0.5955 \| 0.71 \| 1000 \| 0.5220 \| 0.7538 \|
	\| 0.5588 \| 0.75 \| 1050 \| 0.5247 \| 0.75 \|
	\| 0.612 \| 0.79 \| 1100 \| 0.5183 \| 0.7483 \|
	\| 0.6124 \| 0.82 \| 1150 \| 0.5260 \| 0.7542 \|
	\| 0.421 \| 0.86 \| 1200 \| 0.5509 \| 0.7508 \|
	\| 0.4582 \| 0.89 \| 1250 \| 0.5249 \| 0.75 \|
	\| 0.588 \| 0.93 \| 1300 \| 0.5633 \| 0.7267 \|
	\| 0.549 \| 0.96 \| 1350 \| 0.5179 \| 0.7492 \|
	\| 0.495 \| 1.0 \| 1400 \| 0.5456 \| 0.7512 \|
	\| 0.435 \| 1.04 \| 1450 \| 0.5596 \| 0.7504 \|
	\| 0.6061 \| 1.07 \| 1500 \| 0.5421 \| 0.7433 \|
	\| 0.5542 \| 1.11 \| 1550 \| 0.5117 \| 0.7554 \|
	\| 0.4277 \| 1.14 \| 1600 \| 0.5291 \| 0.7521 \|
	\| 0.4415 \| 1.18 \| 1650 \| 0.5354 \| 0.7538 \|
	\| 0.5029 \| 1.21 \| 1700 \| 0.5084 \| 0.7579 \|
	\| 0.6079 \| 1.25 \| 1750 \| 0.5798 \| 0.7554 \|
	\| 0.5692 \| 1.29 \| 1800 \| 0.5003 \| 0.755 \|
	\| 0.5297 \| 1.32 \| 1850 \| 0.5563 \| 0.7588 \|
	\| 0.6938 \| 1.36 \| 1900 \| 0.5064 \| 0.7529 \|
	\| 0.5679 \| 1.39 \| 1950 \| 0.5505 \| 0.7508 \|
	\| 0.4503 \| 1.43 \| 2000 \| 0.5133 \| 0.7554 \|
	\| 0.519 \| 1.46 \| 2050 \| 0.4946 \| 0.7525 \|
	\| 0.513 \| 1.5 \| 2100 \| 0.5156 \| 0.7283 \|
	\| 0.5393 \| 1.54 \| 2150 \| 0.5003 \| 0.7546 \|
	\| 0.6162 \| 1.57 \| 2200 \| 0.4916 \| 0.7625 \|
	\| 0.5526 \| 1.61 \| 2250 \| 0.4980 \| 0.755 \|
	\| 0.4472 \| 1.64 \| 2300 \| 0.5001 \| 0.76 \|
	\| 0.5678 \| 1.68 \| 2350 \| 0.4958 \| 0.7558 \|
	\| 0.3894 \| 1.71 \| 2400 \| 0.4968 \| 0.7646 \|
	\| 0.4086 \| 1.75 \| 2450 \| 0.5065 \| 0.7583 \|
	\| 0.4652 \| 1.79 \| 2500 \| 0.5091 \| 0.7567 \|
	\| 0.4837 \| 1.82 \| 2550 \| 0.5190 \| 0.7312 \|
	\| 0.4745 \| 1.86 \| 2600 \| 0.4998 \| 0.7567 \|
	\| 0.456 \| 1.89 \| 2650 \| 0.5035 \| 0.7558 \|
	\| 0.5784 \| 1.93 \| 2700 \| 0.4997 \| 0.7504 \|
	\| 0.452 \| 1.96 \| 2750 \| 0.5315 \| 0.7517 \|
	\| 0.5682 \| 2.0 \| 2800 \| 0.5827 \| 0.7521 \|
	\| 0.6134 \| 2.04 \| 2850 \| 0.4944 \| 0.7421 \|
	\| 0.3451 \| 2.07 \| 2900 \| 0.5505 \| 0.7575 \|
	\| 0.3682 \| 2.11 \| 2950 \| 0.5122 \| 0.7504 \|
	\| 0.3737 \| 2.14 \| 3000 \| 0.8033 \| 0.7546 \|
	\| 0.4899 \| 2.18 \| 3050 \| 0.5645 \| 0.7446 \|
	\| 0.4885 \| 2.21 \| 3100 \| 0.5229 \| 0.7554 \|
	\| 0.4121 \| 2.25 \| 3150 \| 0.5172 \| 0.7425 \|
	\| 0.3926 \| 2.29 \| 3200 \| 0.5685 \| 0.7512 \|
	\| 0.4242 \| 2.32 \| 3250 \| 0.5380 \| 0.7425 \|
	\| 0.4133 \| 2.36 \| 3300 \| 0.5996 \| 0.7488 \|
	\| 0.4322 \| 2.39 \| 3350 \| 0.5769 \| 0.7533 \|
	\| 0.4561 \| 2.43 \| 3400 \| 0.5525 \| 0.7583 \|
	\| 0.2765 \| 2.46 \| 3450 \| 0.5399 \| 0.7546 \|
	\| 0.4422 \| 2.5 \| 3500 \| 0.5782 \| 0.7554 \|
	\| 0.4343 \| 2.54 \| 3550 \| 0.5325 \| 0.7338 \|
	\| 0.3551 \| 2.57 \| 3600 \| 0.5518 \| 0.7504 \|
	\| 0.4058 \| 2.61 \| 3650 \| 0.5585 \| 0.7579 \|
	\| 0.4838 \| 2.64 \| 3700 \| 0.5433 \| 0.7379 \|
	\| 0.3821 \| 2.68 \| 3750 \| 0.5244 \| 0.7562 \|
	\| 0.4906 \| 2.71 \| 3800 \| 0.5202 \| 0.7525 \|
	\| 0.3046 \| 2.75 \| 3850 \| 0.5430 \| 0.7575 \|
	\| 0.4317 \| 2.79 \| 3900 \| 0.5369 \| 0.7546 \|
	\| 0.5641 \| 2.82 \| 3950 \| 0.5406 \| 0.7546 \|
	\| 0.4866 \| 2.86 \| 4000 \| 0.5454 \| 0.7546 \|
	\| 0.3687 \| 2.89 \| 4050 \| 0.5450 \| 0.7558 \|
	\| 0.484 \| 2.93 \| 4100 \| 0.5456 \| 0.7521 \|
	\| 0.2599 \| 2.96 \| 4150 \| 0.5472 \| 0.7533 \|
	\| 0.3381 \| 3.0 \| 4200 \| 0.5461 \| 0.7508 \|


	### Framework versions

	- Transformers 4.31.0
	- Pytorch 2.0.1+cu118
	- Datasets 2.14.4
	- Tokenizers 0.13.3

	---
	license: mit
	base_model: gpt2-medium
	tags:
	- generated_from_trainer
	metrics:
	- accuracy
	model-index:
	- name: results
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# results

	This model is a fine-tuned version of [gpt2-medium](https://huggingface.co/gpt2-medium) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.5570
	- Accuracy: 0.7508

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 3

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------:\|
	\| 0.6473 \| 0.04 \| 50 \| 0.5683 \| 0.7454 \|
	\| 0.6367 \| 0.07 \| 100 \| 0.5670 \| 0.7525 \|
	\| 0.6016 \| 0.11 \| 150 \| 0.5676 \| 0.7508 \|
	\| 0.6014 \| 0.14 \| 200 \| 0.5498 \| 0.75 \|
	\| 0.5801 \| 0.18 \| 250 \| 0.5446 \| 0.75 \|
	\| 0.4534 \| 0.21 \| 300 \| 0.5383 \| 0.7512 \|
	\| 0.669 \| 0.25 \| 350 \| 0.5700 \| 0.75 \|
	\| 0.5556 \| 0.29 \| 400 \| 0.5536 \| 0.7496 \|
	\| 0.5652 \| 0.32 \| 450 \| 0.6341 \| 0.75 \|
	\| 0.5801 \| 0.36 \| 500 \| 0.5416 \| 0.7454 \|
	\| 0.6476 \| 0.39 \| 550 \| 0.5319 \| 0.7508 \|
	\| 0.5473 \| 0.43 \| 600 \| 0.5422 \| 0.7492 \|
	\| 0.5094 \| 0.46 \| 650 \| 0.5532 \| 0.7504 \|
	\| 0.5656 \| 0.5 \| 700 \| 0.5375 \| 0.7504 \|
	\| 0.532 \| 0.54 \| 750 \| 0.5617 \| 0.7137 \|
	\| 0.5738 \| 0.57 \| 800 \| 0.5501 \| 0.7521 \|
	\| 0.544 \| 0.61 \| 850 \| 0.5449 \| 0.7538 \|
	\| 0.5271 \| 0.64 \| 900 \| 0.5682 \| 0.7496 \|
	\| 0.9725 \| 0.68 \| 950 \| 0.7980 \| 0.4921 \|
	\| 0.5955 \| 0.71 \| 1000 \| 0.5220 \| 0.7538 \|
	\| 0.5588 \| 0.75 \| 1050 \| 0.5247 \| 0.75 \|
	\| 0.612 \| 0.79 \| 1100 \| 0.5183 \| 0.7483 \|
	\| 0.6124 \| 0.82 \| 1150 \| 0.5260 \| 0.7542 \|
	\| 0.421 \| 0.86 \| 1200 \| 0.5509 \| 0.7508 \|
	\| 0.4582 \| 0.89 \| 1250 \| 0.5249 \| 0.75 \|
	\| 0.588 \| 0.93 \| 1300 \| 0.5633 \| 0.7267 \|
	\| 0.549 \| 0.96 \| 1350 \| 0.5179 \| 0.7492 \|
	\| 0.495 \| 1.0 \| 1400 \| 0.5456 \| 0.7512 \|
	\| 0.435 \| 1.04 \| 1450 \| 0.5596 \| 0.7504 \|
	\| 0.6061 \| 1.07 \| 1500 \| 0.5421 \| 0.7433 \|
	\| 0.5542 \| 1.11 \| 1550 \| 0.5117 \| 0.7554 \|
	\| 0.4277 \| 1.14 \| 1600 \| 0.5291 \| 0.7521 \|
	\| 0.4415 \| 1.18 \| 1650 \| 0.5354 \| 0.7538 \|
	\| 0.5029 \| 1.21 \| 1700 \| 0.5084 \| 0.7579 \|
	\| 0.6079 \| 1.25 \| 1750 \| 0.5798 \| 0.7554 \|
	\| 0.5692 \| 1.29 \| 1800 \| 0.5003 \| 0.755 \|
	\| 0.5297 \| 1.32 \| 1850 \| 0.5563 \| 0.7588 \|
	\| 0.6938 \| 1.36 \| 1900 \| 0.5064 \| 0.7529 \|
	\| 0.5679 \| 1.39 \| 1950 \| 0.5505 \| 0.7508 \|
	\| 0.4503 \| 1.43 \| 2000 \| 0.5133 \| 0.7554 \|
	\| 0.519 \| 1.46 \| 2050 \| 0.4946 \| 0.7525 \|
	\| 0.513 \| 1.5 \| 2100 \| 0.5156 \| 0.7283 \|
	\| 0.5393 \| 1.54 \| 2150 \| 0.5003 \| 0.7546 \|
	\| 0.6162 \| 1.57 \| 2200 \| 0.4916 \| 0.7625 \|
	\| 0.5526 \| 1.61 \| 2250 \| 0.4980 \| 0.755 \|
	\| 0.4472 \| 1.64 \| 2300 \| 0.5001 \| 0.76 \|
	\| 0.5678 \| 1.68 \| 2350 \| 0.4958 \| 0.7558 \|
	\| 0.3894 \| 1.71 \| 2400 \| 0.4968 \| 0.7646 \|
	\| 0.4086 \| 1.75 \| 2450 \| 0.5065 \| 0.7583 \|
	\| 0.4652 \| 1.79 \| 2500 \| 0.5091 \| 0.7567 \|
	\| 0.4837 \| 1.82 \| 2550 \| 0.5190 \| 0.7312 \|
	\| 0.4745 \| 1.86 \| 2600 \| 0.4998 \| 0.7567 \|
	\| 0.456 \| 1.89 \| 2650 \| 0.5035 \| 0.7558 \|
	\| 0.5784 \| 1.93 \| 2700 \| 0.4997 \| 0.7504 \|
	\| 0.452 \| 1.96 \| 2750 \| 0.5315 \| 0.7517 \|
	\| 0.5682 \| 2.0 \| 2800 \| 0.5827 \| 0.7521 \|
	\| 0.6134 \| 2.04 \| 2850 \| 0.4944 \| 0.7421 \|
	\| 0.3451 \| 2.07 \| 2900 \| 0.5505 \| 0.7575 \|
	\| 0.3682 \| 2.11 \| 2950 \| 0.5122 \| 0.7504 \|
	\| 0.3737 \| 2.14 \| 3000 \| 0.8033 \| 0.7546 \|
	\| 0.4899 \| 2.18 \| 3050 \| 0.5645 \| 0.7446 \|
	\| 0.4885 \| 2.21 \| 3100 \| 0.5229 \| 0.7554 \|
	\| 0.4121 \| 2.25 \| 3150 \| 0.5172 \| 0.7425 \|
	\| 0.3926 \| 2.29 \| 3200 \| 0.5685 \| 0.7512 \|
	\| 0.4242 \| 2.32 \| 3250 \| 0.5380 \| 0.7425 \|
	\| 0.4133 \| 2.36 \| 3300 \| 0.5996 \| 0.7488 \|
	\| 0.4322 \| 2.39 \| 3350 \| 0.5769 \| 0.7533 \|
	\| 0.4561 \| 2.43 \| 3400 \| 0.5525 \| 0.7583 \|
	\| 0.2765 \| 2.46 \| 3450 \| 0.5399 \| 0.7546 \|
	\| 0.4422 \| 2.5 \| 3500 \| 0.5782 \| 0.7554 \|
	\| 0.4343 \| 2.54 \| 3550 \| 0.5325 \| 0.7338 \|
	\| 0.3551 \| 2.57 \| 3600 \| 0.5518 \| 0.7504 \|
	\| 0.4058 \| 2.61 \| 3650 \| 0.5585 \| 0.7579 \|
	\| 0.4838 \| 2.64 \| 3700 \| 0.5433 \| 0.7379 \|
	\| 0.3821 \| 2.68 \| 3750 \| 0.5244 \| 0.7562 \|
	\| 0.4906 \| 2.71 \| 3800 \| 0.5202 \| 0.7525 \|
	\| 0.3046 \| 2.75 \| 3850 \| 0.5430 \| 0.7575 \|
	\| 0.4317 \| 2.79 \| 3900 \| 0.5369 \| 0.7546 \|
	\| 0.5641 \| 2.82 \| 3950 \| 0.5406 \| 0.7546 \|
	\| 0.4866 \| 2.86 \| 4000 \| 0.5454 \| 0.7546 \|
	\| 0.3687 \| 2.89 \| 4050 \| 0.5450 \| 0.7558 \|
	\| 0.484 \| 2.93 \| 4100 \| 0.5456 \| 0.7521 \|
	\| 0.2599 \| 2.96 \| 4150 \| 0.5472 \| 0.7533 \|
	\| 0.3381 \| 3.0 \| 4200 \| 0.5461 \| 0.7508 \|


	### Framework versions

	- Transformers 4.31.0
	- Pytorch 2.0.1+cu118
	- Datasets 2.14.4
	- Tokenizers 0.13.3