flan-t5-base-instructiongen
Instead of generating questions from text, generate instructions for LLMs!
This model is a fine-tuned version of google/flan-t5-base on the None dataset. It achieves the following results on the evaluation set:
- Loss: 1.0642
- Rouge1: 58.9516
- Rouge2: 41.8006
- Rougel: 56.8249
- Rougelsum: 56.9171
- Gen Len: 13.1493
Intended uses & limitations
Of the three models fine-tuned so far,
flan-t5-base
is in an awkward position where it has the largest model file size, but not the best performance. I'd recommend looking at the two linked below.
This is just a base
FLAN model, and is mostly uploaded for comparison with the FLAN-small and bart-base variants.
Additionally, it was trained on a dataset of only instructions+outputs, with the inputs
filtered out. This means that text of 1) cookies and cream 2) chocolate chip 3) mint chip 4) oreo will not get you "Rank the following ice cream flavors: oreo, mint chip, chocolate chip, cookies and cream"
Training and evaluation data
See the linked dataset pszemraj/fleece2instructions
- it is a filtered/formatted version of tatsu-lab/alpaca
to generate instructions for arbitrary text.
- Some of the API examples are intentionally weird to demonstrate the generalizability of the model.
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 8e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 16
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.02
- num_epochs: 2.0
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
---|---|---|---|---|---|---|---|---|
1.1939 | 1.0 | 362 | 1.0822 | 58.1758 | 40.9388 | 56.1219 | 56.2464 | 13.2592 |
1.1667 | 2.0 | 724 | 1.0642 | 58.9516 | 41.8006 | 56.8249 | 56.9171 | 13.1493 |
- Downloads last month
- 27
Model tree for pszemraj/flan-t5-base-instructiongen
Base model
google/flan-t5-baseDataset used to train pszemraj/flan-t5-base-instructiongen
Evaluation results
- Rouge1 on pszemraj/fleece2instructionsvalidation set self-reported58.952