tyzhu's picture
End of training
9c9225a verified
metadata
license: llama2
base_model: meta-llama/Llama-2-7b-hf
tags:
  - generated_from_trainer
datasets:
  - tyzhu/lmind_nq_train6000_eval6489_v1_qa
metrics:
  - accuracy
model-index:
  - name: lmind_nq_train6000_eval6489_v1_qa_5e-4_lora2
    results:
      - task:
          name: Causal Language Modeling
          type: text-generation
        dataset:
          name: tyzhu/lmind_nq_train6000_eval6489_v1_qa
          type: tyzhu/lmind_nq_train6000_eval6489_v1_qa
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.3654871794871795

lmind_nq_train6000_eval6489_v1_qa_5e-4_lora2

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the tyzhu/lmind_nq_train6000_eval6489_v1_qa dataset. It achieves the following results on the evaluation set:

  • Loss: 5.5751
  • Accuracy: 0.3655

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 50.0

Training results

Training Loss Epoch Step Accuracy Validation Loss
1.43 1.0 187 0.6162 1.2683
1.0285 2.0 375 0.6129 1.3220
0.7318 3.0 562 0.6076 1.4645
0.5898 4.0 750 0.6050 1.5454
0.5309 5.0 937 0.6026 1.6439
0.4985 6.0 1125 0.6034 1.7220
0.5091 7.0 1312 0.6008 1.8008
0.4796 8.0 1500 0.6001 1.7782
0.4453 9.0 1687 0.5985 1.8255
0.448 10.0 1875 0.5931 1.7979
0.4522 11.0 2062 0.5959 1.8272
0.4552 12.0 2250 0.5946 1.8670
0.4551 13.0 2437 0.5950 1.8706
0.4559 14.0 2625 0.5925 1.8731
0.4581 15.0 2812 0.5932 1.8531
0.4535 16.0 3000 0.5923 1.9492
0.4308 17.0 3187 0.5915 1.8944
0.4312 18.0 3375 0.5904 1.9315
0.4372 19.0 3562 0.5899 1.9201
0.4359 20.0 3750 0.5895 1.9753
0.4363 21.0 3937 0.5877 1.9932
0.4404 22.0 4125 0.5866 2.0326
0.4436 23.0 4312 0.5848 2.0008
0.4438 24.0 4500 0.5877 2.0186
0.4233 25.0 4687 0.5863 2.0452
0.4237 26.0 4875 0.5843 2.0520
0.4289 27.0 5062 0.5828 2.0817
0.4325 28.0 5250 0.5833 2.0512
0.4329 29.0 5437 0.5828 2.0906
0.4314 30.0 5625 0.5824 2.0403
0.431 31.0 5812 0.5824 2.1194
0.4318 32.0 6000 0.5829 2.0985
0.414 33.0 6187 0.5805 2.1533
0.4214 34.0 6375 0.5779 2.1918
0.4264 35.0 6562 0.5774 2.1835
0.4361 36.0 6750 0.5771 2.1864
0.4369 37.0 6937 0.5761 2.1546
0.4362 38.0 7125 0.5752 2.1423
0.4322 39.0 7312 0.5778 2.1938
0.4359 40.0 7500 0.5752 2.2000
0.4153 41.0 7687 0.5751 2.2344
0.4195 42.0 7875 0.5747 2.2526
0.9164 43.0 8062 0.5717 2.1985
0.4295 44.0 8250 0.5718 2.2145
0.4298 45.0 8437 0.5714 2.2211
0.4446 46.0 8625 0.5703 2.2656
2.0935 47.0 8812 0.5081 2.6962
3.096 48.0 9000 0.4494 3.2961
2.9615 49.0 9187 0.4241 4.3483
4.5736 49.87 9350 0.3655 5.5751

Framework versions

  • Transformers 4.34.0
  • Pytorch 2.1.0+cu121
  • Datasets 2.18.0
  • Tokenizers 0.14.1