This model is a fine-tuned version of Mixtral-8x22B-Instruct-v0.1 on the mbpp dataset. # Model description More information needed # Intended uses & limitations More information needed # Training and evaluation data More information needed # Training hyperparameters The following hyperparameters were used during training: ### method - stage: sft - finetuning_type: lora - lora_target: all - deepspeed: examples/deepspeed/ds_z3_offload_config.json ### dataset - dataset: mbpp - template: mistral - cutoff_len: 2048 - max_samples: 316 - overwrite_cache: true - preprocessing_num_workers: 16 ### train - per_device_train_batch_size: 1 - gradient_accumulation_steps: 2 - learning_rate: 1.0e-4 - num_train_epochs: 3 - lr_scheduler_type: cosine - warmup_ratio: 0.1 - bf16: true - ddp_timeout: 180000000 # Framework versions - PEFT 0.14.0 - Transformers 4.47.0 - Pytorch 2.5.1+cu124 - Datasets 2.14.6 - Tokenizers 0.21.0 # wandb ![image/png](https://cdn-uploads.huggingface.co/production/uploads/653d073348e79d7e63bb7a70/Tb2-hqybc50cmAAaftlsN.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/653d073348e79d7e63bb7a70/BKHxVIuF38o4w50LeYiAI.png)