flan-xl-gen6

This model is a fine-tuned version of ybelkada/flan-t5-xl-sharded-bf16 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4978
  • Rouge1: 29.5362
  • Rouge2: 20.6621
  • Rougel: 25.7689
  • Rougelsum: 26.2351
  • Gen Len: 12.7388

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 800
  • num_epochs: 8

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
No log 1.0 328 0.6921 34.9112 26.7503 31.4124 31.7295 10.0172
6.8746 2.0 656 0.6025 33.9134 25.3236 30.1968 30.472 10.8454
6.8746 3.0 984 0.5687 31.6178 22.9463 27.8758 28.3572 11.8729
0.6462 4.0 1312 0.5355 30.8157 22.1783 27.1641 27.569 12.1306
0.5618 5.0 1640 0.5160 29.9183 21.0842 26.1671 26.5965 12.5017
0.5618 6.0 1968 0.5025 29.7823 21.1443 26.0286 26.5215 12.5086
0.498 7.0 2296 0.4978 29.1043 20.2391 25.3347 25.804 12.8969
0.4551 8.0 2624 0.4978 29.5362 20.6621 25.7689 26.2351 12.7388

Framework versions

  • Transformers 4.35.2
  • Pytorch 2.1.0+cu118
  • Datasets 2.15.0
  • Tokenizers 0.15.0
Downloads last month
33
Safetensors
Model size
2.85B params
Tensor type
F32
FP16
I8
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for devvanshhh/flan-xl-gen6

Quantized
(2)
this model