--- library_name: transformers license: mit base_model: facebook/mbart-large-50 tags: - generated_from_trainer metrics: - rouge model-index: - name: RuTaskFlow-mBART-T26-200K results: [] pipeline_tag: text-generation --- # RuTaskFlow-mBART-T26-200K This model is a fine-tuned version of [facebook/mbart-large-50](https://huggingface.co/facebook/mbart-large-50) on the None dataset. It achieves the following results on the evaluation set: - Loss: 0.5087 - Rouge1: 85.72 - Rouge2: 63.57 - Rougel: 83.99 - Rougelsum: 83.99 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 3e-05 - train_batch_size: 6 - eval_batch_size: 6 - seed: 42 - gradient_accumulation_steps: 4 - total_train_batch_size: 24 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 4 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | |:-------------:|:------:|:-----:|:---------------:|:------:|:------:|:------:|:---------:| | 0.5496 | 0.9999 | 6803 | 0.5541 | 81.99 | 59.63 | 80.24 | 80.22 | | 0.4657 | 2.0 | 13607 | 0.5035 | 84.21 | 62.28 | 82.5 | 82.48 | | 0.307 | 2.9999 | 20410 | 0.4942 | 85.26 | 62.93 | 83.5 | 83.5 | | 0.2345 | 3.9997 | 27212 | 0.5087 | 85.72 | 63.57 | 83.99 | 83.99 | ### Framework versions - Transformers 4.45.2 - Pytorch 2.5.1+cu121 - Datasets 3.1.0 - Tokenizers 0.20.3