zera09
/

long_t5_test

Generated from Trainer

Model card Files Files and versions Metrics Training metrics Community

long_t5_test / README.md

zera09's picture

End of training

c426394 verified about 1 month ago

|

history blame contribute delete

4.01 kB

	---
	license: apache-2.0
	base_model: google/long-t5-tglobal-base
	tags:
	- generated_from_trainer
	model-index:
	- name: long_t5_test
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# long_t5_test

	This model is a fine-tuned version of [google/long-t5-tglobal-base](https://huggingface.co/google/long-t5-tglobal-base) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.3506
	- Rouge Rouge1: 0.4697
	- Rouge Rouge2: 0.1989
	- Rouge Rougel: 0.274
	- Rouge Rougelsum: 0.2736
	- Gen Len: 388.0152

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 20

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rouge Rouge1 \| Rouge Rouge2 \| Rouge Rougel \| Rouge Rougelsum \| Gen Len \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:------------:\|:------------:\|:------------:\|:---------------:\|:--------:\|
	\| No log \| 1.0 \| 394 \| 1.9389 \| 0.0284 \| 0.0089 \| 0.0167 \| 0.0165 \| 30.2273 \|
	\| 3.6937 \| 2.0 \| 788 \| 1.4702 \| 0.4261 \| 0.1598 \| 0.254 \| 0.2539 \| 399.0 \|
	\| 1.8772 \| 3.0 \| 1182 \| 1.4362 \| 0.4397 \| 0.1699 \| 0.2592 \| 0.2591 \| 398.5152 \|
	\| 1.7418 \| 4.0 \| 1576 \| 1.4204 \| 0.4434 \| 0.1779 \| 0.2627 \| 0.2628 \| 397.7374 \|
	\| 1.7418 \| 5.0 \| 1970 \| 1.4108 \| 0.4474 \| 0.181 \| 0.2631 \| 0.263 \| 394.798 \|
	\| 1.6623 \| 6.0 \| 2364 \| 1.3932 \| 0.4546 \| 0.1873 \| 0.2675 \| 0.2673 \| 391.8586 \|
	\| 1.6449 \| 7.0 \| 2758 \| 1.3872 \| 0.4559 \| 0.1882 \| 0.2665 \| 0.2664 \| 393.4848 \|
	\| 1.5757 \| 8.0 \| 3152 \| 1.3814 \| 0.458 \| 0.1906 \| 0.2692 \| 0.2692 \| 397.1061 \|
	\| 1.5527 \| 9.0 \| 3546 \| 1.3718 \| 0.4607 \| 0.1912 \| 0.2705 \| 0.2706 \| 391.7222 \|
	\| 1.5527 \| 10.0 \| 3940 \| 1.3703 \| 0.4649 \| 0.194 \| 0.2717 \| 0.2719 \| 393.8788 \|
	\| 1.5302 \| 11.0 \| 4334 \| 1.3621 \| 0.4664 \| 0.197 \| 0.2726 \| 0.2724 \| 386.2071 \|
	\| 1.5142 \| 12.0 \| 4728 \| 1.3537 \| 0.4694 \| 0.1977 \| 0.2731 \| 0.2731 \| 388.9798 \|
	\| 1.4721 \| 13.0 \| 5122 \| 1.3528 \| 0.4652 \| 0.1961 \| 0.2716 \| 0.2714 \| 390.2828 \|
	\| 1.4745 \| 14.0 \| 5516 \| 1.3550 \| 0.4708 \| 0.2009 \| 0.2742 \| 0.2739 \| 393.8131 \|
	\| 1.4745 \| 15.0 \| 5910 \| 1.3500 \| 0.471 \| 0.199 \| 0.2742 \| 0.2741 \| 385.4192 \|
	\| 1.4799 \| 16.0 \| 6304 \| 1.3505 \| 0.4725 \| 0.2008 \| 0.2764 \| 0.2761 \| 387.6364 \|
	\| 1.4558 \| 17.0 \| 6698 \| 1.3535 \| 0.4743 \| 0.2032 \| 0.2765 \| 0.2764 \| 389.4192 \|
	\| 1.4426 \| 18.0 \| 7092 \| 1.3494 \| 0.4743 \| 0.2042 \| 0.278 \| 0.2776 \| 386.4394 \|
	\| 1.4426 \| 19.0 \| 7486 \| 1.3513 \| 0.4719 \| 0.2019 \| 0.2753 \| 0.2752 \| 388.6515 \|
	\| 1.4411 \| 20.0 \| 7880 \| 1.3506 \| 0.4697 \| 0.1989 \| 0.274 \| 0.2736 \| 388.0152 \|


	### Framework versions

	- Transformers 4.37.2
	- Pytorch 2.1.1+cu121
	- Datasets 3.0.1
	- Tokenizers 0.15.1