YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

This model is a fine-tuned version of Mixtral-8x22B-Instruct-v0.1 on the mbpp dataset.

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training hyperparameters

The following hyperparameters were used during training:

method

  • stage: sft
  • finetuning_type: lora
  • lora_target: all
  • deepspeed: examples/deepspeed/ds_z3_offload_config.json

dataset

  • dataset: mbpp
  • template: mistral
  • cutoff_len: 2048
  • max_samples: 316
  • overwrite_cache: true
  • preprocessing_num_workers: 16

train

  • per_device_train_batch_size: 1
  • gradient_accumulation_steps: 2
  • learning_rate: 1.0e-4
  • num_train_epochs: 3
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: true
  • ddp_timeout: 180000000

Framework versions

  • PEFT 0.14.0
  • Transformers 4.47.0
  • Pytorch 2.5.1+cu124
  • Datasets 2.14.6
  • Tokenizers 0.21.0

wandb

image/png

image/png

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .