metadata

license: other
base_model: meta-llama/Meta-Llama-3-8B
tags:
  - llama-factory
  - full
  - generated_from_trainer
model-index:
  - name: C014_llama3-8b-base_pretrain_20240428_005832
    results: []

C014_llama3-8b-base_pretrain_20240428_005832

This model is a fine-tuned version of /mnt/models-pku/progressalign/shared_storage/downloaded_models/llama3-8b-base on the C014_data dataset. It achieves the following results on the evaluation set:

Loss: 2.2045

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1.5e-05
train_batch_size: 8
eval_batch_size: 16
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 64
total_eval_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: polynomial
lr_scheduler_warmup_steps: 20
num_epochs: 4.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
2.5789	0.0152	1	2.6458
2.5672	0.0758	5	2.6280
2.5751	0.1515	10	2.5314
2.418	0.2273	15	2.4634
2.4701	0.3030	20	2.4177
2.3904	0.3788	25	2.3785
2.3539	0.4545	30	2.3378
2.3101	0.5303	35	2.3082
2.3254	0.6061	40	2.2816
2.2762	0.6818	45	2.2614
2.2525	0.7576	50	2.2458
2.2777	0.8333	55	2.2321
2.2054	0.9091	60	2.2206
2.237	0.9848	65	2.2113
1.986	1.0606	70	2.2115
1.9373	1.1364	75	2.2217
1.9228	1.2121	80	2.2132
1.9084	1.2879	85	2.2118
1.9684	1.3636	90	2.2122
1.9126	1.4394	95	2.2094
1.9101	1.5152	100	2.2066
1.8496	1.5909	105	2.2058
1.9154	1.6667	110	2.2057
1.9233	1.7424	115	2.2056
1.9198	1.8182	120	2.2052
1.9229	1.8939	125	2.2048
1.8913	1.9697	130	2.2045
1.8814	2.0455	135	2.2046
1.8813	2.1212	140	2.2051
1.8912	2.1970	145	2.2058
1.9184	2.2727	150	2.2065
1.8662	2.3485	155	2.2071
1.8809	2.4242	160	2.2074
1.8591	2.5	165	2.2077
1.8731	2.5758	170	2.2079
1.8948	2.6515	175	2.2082
1.8876	2.7273	180	2.2082
1.8408	2.8030	185	2.2083
1.8931	2.8788	190	2.2082
1.8569	2.9545	195	2.2080
1.8621	3.0303	200	2.2079
1.8863	3.1061	205	2.2078
1.9021	3.1818	210	2.2079
1.8648	3.2576	215	2.2080
1.8443	3.3333	220	2.2081
1.8978	3.4091	225	2.2080
1.8658	3.4848	230	2.2080
1.8706	3.5606	235	2.2079
1.8855	3.6364	240	2.2078
1.8535	3.7121	245	2.2078
1.9062	3.7879	250	2.2079
1.8628	3.8636	255	2.2078
1.8484	3.9394	260	2.2077

Framework versions

Transformers 4.40.0
Pytorch 2.1.2+cu121
Datasets 2.18.0
Tokenizers 0.19.1