metadata
tags:
- generated_from_trainer
model-index:
- name: vicuna_13b_stage1
results: []
vicuna_13b_stage1
This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:
- Loss: 1.2017
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0005
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- total_eval_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 40
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
1.9535 | 0.02 | 40 | 1.9456 |
1.8556 | 0.04 | 80 | 1.7714 |
1.791 | 0.06 | 120 | 1.7425 |
1.6622 | 0.08 | 160 | 1.7164 |
1.8169 | 0.1 | 200 | 1.7154 |
1.7356 | 0.12 | 240 | 1.7026 |
1.6051 | 0.14 | 280 | 1.7104 |
1.7925 | 0.16 | 320 | 1.7127 |
1.8257 | 0.18 | 360 | 1.7055 |
1.7057 | 0.2 | 400 | 1.6906 |
1.9282 | 0.22 | 440 | 1.6746 |
1.668 | 0.24 | 480 | 1.7052 |
1.6273 | 0.26 | 520 | 1.6620 |
1.6136 | 0.28 | 560 | 1.6616 |
1.4754 | 0.3 | 600 | 1.6389 |
1.4024 | 0.32 | 640 | 1.6038 |
1.6773 | 0.34 | 680 | 1.5743 |
1.6008 | 0.36 | 720 | 1.5607 |
1.568 | 0.39 | 760 | 1.5236 |
1.4922 | 0.41 | 800 | 1.5158 |
1.4667 | 0.43 | 840 | 1.4938 |
1.5653 | 0.45 | 880 | 1.4692 |
1.331 | 0.47 | 920 | 1.4581 |
1.4019 | 0.49 | 960 | 1.4290 |
1.4925 | 0.51 | 1000 | 1.4087 |
1.4772 | 0.53 | 1040 | 1.3961 |
1.4728 | 0.55 | 1080 | 1.3817 |
1.4555 | 0.57 | 1120 | 1.3559 |
1.5487 | 0.59 | 1160 | 1.3399 |
1.3888 | 0.61 | 1200 | 1.3212 |
1.2544 | 0.63 | 1240 | 1.3099 |
1.2657 | 0.65 | 1280 | 1.2972 |
1.3641 | 0.67 | 1320 | 1.2815 |
1.2915 | 0.69 | 1360 | 1.2687 |
1.4182 | 0.71 | 1400 | 1.2541 |
1.2515 | 0.73 | 1440 | 1.2427 |
1.2287 | 0.75 | 1480 | 1.2352 |
1.1886 | 0.77 | 1520 | 1.2285 |
1.2651 | 0.79 | 1560 | 1.2219 |
1.3341 | 0.81 | 1600 | 1.2145 |
1.2357 | 0.83 | 1640 | 1.2107 |
1.0767 | 0.85 | 1680 | 1.2080 |
1.2158 | 0.87 | 1720 | 1.2051 |
1.2042 | 0.89 | 1760 | 1.2034 |
1.1887 | 0.91 | 1800 | 1.2023 |
1.2662 | 0.93 | 1840 | 1.2018 |
1.1866 | 0.95 | 1880 | 1.2017 |
1.1798 | 0.97 | 1920 | 1.2017 |
1.336 | 0.99 | 1960 | 1.2017 |
Framework versions
- Transformers 4.34.1
- Pytorch 2.3.1+cu121
- Datasets 2.14.7
- Tokenizers 0.14.1