lemexp-task4-option2_small-deepseek-coder-1.3b-base-ddp-8lr

This model is a fine-tuned version of deepseek-ai/deepseek-coder-1.3b-base on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0643

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0008
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 16
  • total_eval_batch_size: 16
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 12
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.1964 0.2001 629 0.1207
0.1225 0.4001 1258 0.1088
0.1108 0.6002 1887 0.0987
0.1024 0.8003 2516 0.0969
0.0983 1.0003 3145 0.0961
0.0955 1.2004 3774 0.0957
0.0941 1.4004 4403 0.0907
0.09 1.6005 5032 0.0873
0.0908 1.8006 5661 0.0866
0.0869 2.0006 6290 0.0929
0.0879 2.2007 6919 0.0852
0.0865 2.4008 7548 0.0827
0.0845 2.6008 8177 0.0800
0.087 2.8009 8806 0.0829
0.0835 3.0010 9435 0.0819
0.0811 3.2010 10064 0.0820
0.0802 3.4011 10693 0.0771
0.0794 3.6011 11322 0.0773
0.0794 3.8012 11951 0.0793
0.079 4.0013 12580 0.0795
0.0777 4.2013 13209 0.0781
0.0746 4.4014 13838 0.0748
0.076 4.6015 14467 0.0764
0.0769 4.8015 15096 0.0740
0.0748 5.0016 15725 0.0766
0.0734 5.2017 16354 0.0734
0.0734 5.4017 16983 0.0731
0.0713 5.6018 17612 0.0739
0.0733 5.8018 18241 0.0707
0.0712 6.0019 18870 0.0768
0.0705 6.2020 19499 0.0712
0.0692 6.4020 20128 0.0704
0.0697 6.6021 20757 0.0683
0.0681 6.8022 21386 0.0681
0.0688 7.0022 22015 0.0694
0.0668 7.2023 22644 0.0690
0.0663 7.4024 23273 0.0682
0.0669 7.6024 23902 0.0676
0.0656 7.8025 24531 0.0680
0.0656 8.0025 25160 0.0675
0.0635 8.2026 25789 0.0673
0.0628 8.4027 26418 0.0660
0.0643 8.6027 27047 0.0662
0.0632 8.8028 27676 0.0651
0.0632 9.0029 28305 0.0650
0.062 9.2029 28934 0.0657
0.0613 9.4030 29563 0.0659
0.0611 9.6031 30192 0.0661
0.0612 9.8031 30821 0.0646
0.0613 10.0032 31450 0.0633
0.0589 10.2032 32079 0.0642
0.0597 10.4033 32708 0.0640
0.0592 10.6034 33337 0.0633
0.0593 10.8034 33966 0.0633
0.0596 11.0035 34595 0.0631
0.0583 11.2036 35224 0.0636
0.0581 11.4036 35853 0.0638
0.0583 11.6037 36482 0.0647
0.0572 11.8038 37111 0.0643

Framework versions

  • PEFT 0.14.0
  • Transformers 4.47.0
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
67
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for yalhessi/lemexp-task4-option2_small-deepseek-coder-1.3b-base-ddp-8lr

Adapter
(140)
this model