metadata

library_name: peft
license: llama3.2
base_model: meta-llama/Llama-3.2-3B
tags:
  - generated_from_trainer
model-index:
  - name: outputs/dippy-2
    results: []

See axolotl config

axolotl version: 0.5.0

base_model: meta-llama/Llama-3.2-3B
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: true
strict: false

#wget -O dataset_2000.jsonl http://94.130.230.31/dataset_2000.jsonl
chat_template: llama3
datasets:
  - path: ./dataset_2000.jsonl
    type: chat_template
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./outputs/dippy-2

sequence_len: 4096
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true

adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_modules_to_save:
  - embed_tokens
  - lm_head

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 12
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16: true
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
s2_attention:

warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
  pad_token: <|end_of_text|>

outputs/dippy-2

This model is a fine-tuned version of meta-llama/Llama-3.2-3B on the None dataset. It achieves the following results on the evaluation set:

Loss: 3.0961

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
num_epochs: 12

Training results

Training Loss	Epoch	Step	Validation Loss
1.9507	0.0153	1	1.9943
1.714	0.2605	17	1.7193
1.5507	0.5211	34	1.7040
1.6354	0.7816	51	1.6666
0.9188	1.0383	68	1.6559
0.8897	1.2989	85	1.6953
0.9014	1.5594	102	1.7119
0.8517	1.8199	119	1.7209
0.4448	2.0843	136	1.7969
0.4053	2.3448	153	1.8347
0.3723	2.6054	170	1.8777
0.339	2.8659	187	1.8751
0.1614	3.1264	204	2.0658
0.1804	3.3870	221	2.0643
0.1881	3.6475	238	2.0924
0.1762	3.9080	255	2.0624
0.195	4.1686	272	2.3268
0.0649	4.4291	289	2.2718
0.0786	4.6897	306	2.2569
0.0763	4.9502	323	2.2521
0.0509	5.2107	340	2.4546
0.0374	5.4713	357	2.4693
0.0216	5.7318	374	2.4763
0.0272	5.9923	391	2.5110
0.0117	6.2490	408	2.7330
0.0115	6.5096	425	2.6403
0.0092	6.7701	442	2.7747
0.0064	7.0268	459	2.7342
0.0059	7.2874	476	2.8930
0.0065	7.5479	493	2.9133
0.0059	7.8084	510	2.9216
0.0058	8.0690	527	2.9435
0.0046	8.3295	544	3.0068
0.0051	8.5900	561	3.0261
0.0044	8.8506	578	3.0278
0.0035	9.1073	595	3.0368
0.0038	9.3678	612	3.0577
0.004	9.6284	629	3.0710
0.0041	9.8889	646	3.0796
0.0038	10.1533	663	3.0823
0.0039	10.4138	680	3.0844
0.0041	10.6743	697	3.0886
0.004	10.9349	714	3.0952
0.0038	11.1992	731	3.0955
0.0033	11.4598	748	3.0949
0.0044	11.7203	765	3.0961

Framework versions

PEFT 0.13.2
Transformers 4.46.3
Pytorch 2.5.1+cu124
Datasets 3.1.0
Tokenizers 0.20.3