Llama-3.2-400M-Amharic-Poems-Stories-V7

This model is a fine-tuned version of rasyosef/Llama-3.2-400M-Amharic on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0906
  • Model Preparation Time: 0.0031

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • lr_scheduler_warmup_steps: 250
  • num_epochs: 4
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Model Preparation Time
2.7914 0.2161 500 1.8555 0.0031
1.5562 0.4322 1000 1.2348 0.0031
1.1782 0.6484 1500 1.0413 0.0031
1.1022 0.8645 2000 0.9654 0.0031
0.8029 1.0806 2500 1.0335 0.0031
0.4356 1.2967 3000 0.9861 0.0031
0.4407 1.5129 3500 0.9813 0.0031
0.4433 1.7290 4000 0.9752 0.0031
0.4439 1.9451 4500 0.9300 0.0031
0.218 2.1612 5000 1.0673 0.0031
0.1549 2.3774 5500 1.0764 0.0031
0.1602 2.5935 6000 1.0732 0.0031
0.1525 2.8096 6500 1.0654 0.0031
0.1473 3.0257 7000 1.0759 0.0031
0.0907 3.2418 7500 1.0896 0.0031
0.092 3.4580 8000 1.0920 0.0031
0.0925 3.6741 8500 1.0912 0.0031
0.0972 3.8902 9000 1.0906 0.0031

Framework versions

  • Transformers 4.45.0
  • Pytorch 2.4.1+cu121
  • Datasets 3.3.2
  • Tokenizers 0.20.3
Downloads last month
37
Safetensors
Model size
413M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for yosefw/Llama-3.2-400M-Amharic-Poems-Stories-V7

Finetuned
(9)
this model