1min-v2-luxia-8b / README.md
esunn's picture
Update README.md
bbf3092 verified
metadata
license: llama3
base_model: maywell/Llama-3-Ko-Luxia-Instruct
tags:
  - generated_from_trainer
model-index:
  - name: data/output/1min-v2-luxia-8b
    results: []

Built with Axolotl

See axolotl config

axolotl version: 0.4.0

base_model: maywell/Llama-3-Ko-Luxia-Instruct
trust_remote_code: true
load_in_8bit: false
load_in_4bit: false
strict: false
datasets:
  - path: "../data/generated_ds.json"
    type: alpaca
    conversation: chatml
dataset_prepared_path: ../data/dataset_v2_pre
val_set_size: 0.05
output_dir: ../data/output/1min-v2-luxia-8b
sequence_len: 1024
sample_packing: true
pad_to_sequence_len: true
eval_sample_packing: false
wandb_project: 
wandb_entity: 
wandb_watch:
wandb_name:
wandb_log_model:
gradient_accumulation_steps: 4
micro_batch_size: 1
num_epochs: 10
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 2e-6
train_on_inputs: false
group_by_length: false
bf16: auto
fp16: null
tf32: false
gradient_checkpointing: true
early_stopping_patience: null
resume_from_checkpoint: null
local_rank: null
logging_steps: 1
xformers_attention: null
flash_attention: true
warmup_steps: 10
evals_per_epoch: 4
eval_table_size: null
eval_max_new_tokens: 128
saves_per_epoch: 1
save_total_limit: 4
debug: true
deepspeed: deepspeed_configs/zero2.json
weight_decay: 0.0
special_tokens:
  pad_token: <|end_of_text|>

data/output/1min-v2-luxia-8b

This model is a fine-tuned version of maywell/Llama-3-Ko-Luxia-Instruct on the manipulated instructkr/ko_youtube_transcription_v2_filtered dataset. It achieves the following results on the evaluation set:

  • Loss: 2.0986

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-06
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 7
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 28
  • total_eval_batch_size: 7
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss
2.6145 0.0513 1 2.7217
2.7668 0.2564 5 2.7018
2.6304 0.5128 10 2.5065
2.3635 0.7692 15 2.3580
2.4553 1.0256 20 2.2813
2.2344 1.2436 25 2.2339
2.4562 1.5 30 2.2017
2.0943 1.7564 35 2.1726
2.0695 2.0128 40 2.1425
1.8616 2.2308 45 2.1171
2.0498 2.4872 50 2.1040
1.9028 2.7436 55 2.0984
1.9057 3.0 60 2.0841
1.7464 3.2179 65 2.0784
1.8284 3.4744 70 2.0788
1.8866 3.7308 75 2.0761
1.8927 3.9872 80 2.0673
1.5778 4.2051 85 2.0779
1.7274 4.4615 90 2.0934
1.7431 4.7179 95 2.0652
1.8728 4.9744 100 2.0618
1.5729 5.1923 105 2.0837
1.4631 5.4487 110 2.0873
1.4758 5.7051 115 2.0744
1.5289 5.9615 120 2.0899
1.515 6.1795 125 2.0919
1.5757 6.4359 130 2.0978
1.5392 6.6923 135 2.0986

Framework versions

  • Transformers 4.40.2
  • Pytorch 2.1.2+cu118
  • Datasets 2.19.1
  • Tokenizers 0.19.1