yesj1234
/

ko-en_mbartLarge_exp20p_linear_3gram

text2text-generation

Generated from Trainer

Model card Files Files and versions Community

ko-en_mbartLarge_exp20p_linear_3gram / README.md

yesj1234's picture

Upload folder using huggingface_hub

13695b0 almost 2 years ago

|

history blame contribute delete

2.96 kB

	---
	language:
	- ko
	- en
	base_model: facebook/mbart-large-50-many-to-many-mmt
	tags:
	- generated_from_trainer
	metrics:
	- bleu
	model-index:
	- name: ko-en_mbartLarge_exp20p_linear
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# ko-en_mbartLarge_exp20p_linear

	This model is a fine-tuned version of [facebook/mbart-large-50-many-to-many-mmt](https://huggingface.co/facebook/mbart-large-50-many-to-many-mmt) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.1514
	- Bleu: 29.2703
	- Gen Len: 18.512

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 4
	- eval_batch_size: 4
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 4
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 32
	- total_eval_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 2000
	- num_epochs: 40

	### Training results

	\| Training Loss \| Epoch \| Step \| Bleu \| Gen Len \| Validation Loss \|
	\|:-------------:\|:-----:\|:-----:\|:-------:\|:-------:\|:---------------:\|
	\| 1.3977 \| 0.46 \| 4000 \| 22.7153 \| 18.7135 \| 1.3720 \|
	\| 1.2824 \| 0.93 \| 8000 \| 24.8579 \| 18.7821 \| 1.2633 \|
	\| 1.1989 \| 1.39 \| 12000 \| 26.2533 \| 18.7975 \| 1.2069 \|
	\| 1.1534 \| 1.86 \| 16000 \| 26.1503 \| 19.2075 \| 1.1907 \|
	\| 1.0245 \| 2.32 \| 20000 \| 27.8764 \| 18.6046 \| 1.1464 \|
	\| 1.0186 \| 2.78 \| 24000 \| 28.4585 \| 18.6731 \| 1.1286 \|
	\| 0.9245 \| 3.25 \| 28000 \| 1.1264 \| 28.4834 \| 18.5428 \|
	\| 0.9343 \| 3.71 \| 32000 \| 1.1182 \| 28.8235 \| 18.7833 \|
	\| 0.8215 \| 4.18 \| 36000 \| 1.1331 \| 28.6134 \| 18.5656 \|
	\| 0.8456 \| 4.64 \| 40000 \| 1.1203 \| 28.7324 \| 18.459 \|
	\| 0.7437 \| 5.11 \| 44000 \| 1.1458 \| 28.7297 \| 18.7835 \|
	\| 0.7829 \| 5.57 \| 48000 \| 1.1367 \| 28.8328 \| 18.6052 \|
	\| 0.7434 \| 6.03 \| 52000 \| 1.1697 \| 28.2106 \| 18.4871 \|
	\| 0.7153 \| 6.5 \| 56000 \| 1.1771 \| 28.1455 \| 18.7413 \|
	\| 0.6996 \| 6.96 \| 60000 \| 1.1514 \| 29.2694 \| 18.5162 \|
	\| 0.6336 \| 7.43 \| 64000 \| 1.2213 \| 28.1465 \| 18.5439 \|
	\| 0.7218 \| 7.89 \| 68000 \| 1.1835 \| 28.2245 \| 18.5246 \|
	\| 0.5934 \| 8.35 \| 72000 \| 1.2387 \| 28.3836 \| 18.6717 \|
	\| 0.5723 \| 8.82 \| 76000 \| 1.2323 \| 28.5925 \| 18.5566 \|


	### Framework versions

	- Transformers 4.34.0
	- Pytorch 2.1.0+cu121
	- Datasets 2.14.5
	- Tokenizers 0.14.1