TinyLlama-1.1B-Chat-v1.0-sft-chat_threads

This model is a fine-tuned version of mjschock/TinyLlama-1.1B-Chat-v1.0 on the mjschock/chat_threads dataset. It achieves the following results on the evaluation set:

  • Loss: 0.5586
  • Bleu: 0.7572
  • Precisions: 0.7641
  • Brevity Penalty: 0.9983
  • Length Ratio: 0.9986
  • Translation Length: 582.3552
  • Reference Length: 582.9104
  • Meteor: 0.7364
  • Rouge1: 0.7900
  • Rouge2: 0.5570
  • Rougel: 0.7250
  • Rougelsum: 0.7838

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Bleu Precisions Brevity Penalty Length Ratio Translation Length Reference Length Meteor Rouge1 Rouge2 Rougel Rougelsum
No log 0 0 0.8976 0.6391 0.6567 0.9934 0.9936 579.7720 582.9104 0.6775 0.6912 0.3881 0.5809 0.6813
0.7612 0.9630 13 0.7168 0.6941 0.7056 0.9969 0.9973 581.2681 582.9104 0.7030 0.7375 0.4604 0.6572 0.7281
0.6321 2.0 27 0.5992 0.7420 0.7498 0.9981 0.9981 582.0161 582.9104 0.7312 0.7780 0.5342 0.7069 0.7720
0.5738 2.8889 39 0.5586 0.7572 0.7641 0.9983 0.9986 582.3552 582.9104 0.7364 0.7900 0.5570 0.7250 0.7838

Framework versions

  • PEFT 0.13.2
  • Transformers 4.44.2
  • Pytorch 2.5.1+cu124
  • Datasets 3.1.0
  • Tokenizers 0.19.1
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for mjschock/TinyLlama-1.1B-Chat-v1.0-sft-chat_threads

Adapter
(1)
this model