See axolotl config

axolotl version: 0.4.0

base_model: maywell/Llama-3-Ko-Luxia-Instruct
trust_remote_code: true
load_in_8bit: false
load_in_4bit: false
strict: false
datasets:
  - path: "../data/generated_ds.json"
    type: alpaca
    conversation: chatml
dataset_prepared_path: ../data/dataset_v2_pre
val_set_size: 0.05
output_dir: ../data/output/1min-v2-luxia-8b
sequence_len: 1024
sample_packing: true
pad_to_sequence_len: true
eval_sample_packing: false
wandb_project: 
wandb_entity: 
wandb_watch:
wandb_name:
wandb_log_model:
gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 10
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 2e-6
train_on_inputs: false
group_by_length: false
bf16: auto
fp16: null
tf32: false
gradient_checkpointing: true
early_stopping_patience: null
resume_from_checkpoint: null
local_rank: null
logging_steps: 1
xformers_attention: null
flash_attention: true
warmup_steps: 10
evals_per_epoch: 4
eval_table_size: null
eval_max_new_tokens: 128
saves_per_epoch: 1
save_total_limit: 4
debug: true
deepspeed: deepspeed_configs/zero2.json
weight_decay: 0.0
special_tokens:
  pad_token: <|end_of_text|>

data/output/1min-v2-luxia-8b

This model is a fine-tuned version of maywell/Llama-3-Ko-Luxia-Instruct on the manipulated instructkr/ko_youtube_transcription_v2_filtered dataset. It achieves the following results on the evaluation set:

Loss: 2.0986

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-06
train_batch_size: 1
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 7
gradient_accumulation_steps: 4
total_train_batch_size: 28
total_eval_batch_size: 7
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss
2.6145	0.0513	1	2.7217
2.7668	0.2564	5	2.7018
2.6304	0.5128	10	2.5065
2.3635	0.7692	15	2.3580
2.4553	1.0256	20	2.2813
2.2344	1.2436	25	2.2339
2.4562	1.5	30	2.2017
2.0943	1.7564	35	2.1726
2.0695	2.0128	40	2.1425
1.8616	2.2308	45	2.1171
2.0498	2.4872	50	2.1040
1.9028	2.7436	55	2.0984
1.9057	3.0	60	2.0841
1.7464	3.2179	65	2.0784
1.8284	3.4744	70	2.0788
1.8866	3.7308	75	2.0761
1.8927	3.9872	80	2.0673
1.5778	4.2051	85	2.0779
1.7274	4.4615	90	2.0934
1.7431	4.7179	95	2.0652
1.8728	4.9744	100	2.0618
1.5729	5.1923	105	2.0837
1.4631	5.4487	110	2.0873
1.4758	5.7051	115	2.0744
1.5289	5.9615	120	2.0899
1.515	6.1795	125	2.0919
1.5757	6.4359	130	2.0978
1.5392	6.6923	135	2.0986

Framework versions

Transformers 4.40.2
Pytorch 2.1.2+cu118
Datasets 2.19.1
Tokenizers 0.19.1

esunn
/

1min-v2-luxia-8b

data/output/1min-v2-luxia-8b

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for esunn/1min-v2-luxia-8b

Evaluation results