zephyr-8b-sft-full

This model is a fine-tuned version of meta-llama/Llama-3.1-8B on the HuggingFaceH4/ultrachat_200k dataset. It achieves the following results on the evaluation set:

  • Loss: 1.0747

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 16
  • total_train_batch_size: 128
  • total_eval_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss
1.103 0.1052 100 1.0989
1.0867 0.2103 200 1.0966
1.111 0.3155 300 1.1012
1.0974 0.4206 400 1.0966
1.0898 0.5258 500 1.0920
1.0749 0.6309 600 1.0876
1.0847 0.7361 700 1.0831
1.0749 0.8412 800 1.0778
1.055 0.9464 900 1.0720
0.9184 1.0515 1000 1.0817
0.8955 1.1567 1100 1.0779
0.914 1.2618 1200 1.0758
0.9098 1.3670 1300 1.0698
0.9126 1.4721 1400 1.0667
0.9032 1.5773 1500 1.0604
0.8882 1.6824 1600 1.0546
0.8847 1.7876 1700 1.0490
0.8831 1.8927 1800 1.0455
0.8781 1.9979 1900 1.0413
0.7197 2.1030 2000 1.0822
0.7137 2.2082 2100 1.0841
0.7115 2.3134 2200 1.0800
0.7178 2.4185 2300 1.0789
0.7063 2.5237 2400 1.0777
0.6964 2.6288 2500 1.0755
0.7121 2.7340 2600 1.0742
0.7049 2.8391 2700 1.0748
0.7024 2.9443 2800 1.0747

Framework versions

  • Transformers 4.45.2
  • Pytorch 2.2.2+rocm5.7
  • Datasets 3.2.0
  • Tokenizers 0.20.3
Downloads last month
11
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for li-muyang/zephyr-8b-sft-full

Finetuned
(870)
this model

Dataset used to train li-muyang/zephyr-8b-sft-full