midreal
/

mbpp_sft_mixtrall_8x22B_1227

Model card Files Files and versions Community

kehuitt commited on Dec 28, 2024

Commit

537d288

·

verified ·

1 Parent(s): c9739cc

Create README.md

Files changed (1) hide show

README.md +44 -0

README.md ADDED Viewed

	@@ -0,0 +1,44 @@

+This model is a fine-tuned version of Mixtral-8x22B-Instruct-v0.1 on the mbpp dataset.
+# Model description
+More information needed
+# Intended uses & limitations
+More information needed
+# Training and evaluation data
+More information needed
+# Training hyperparameters
+The following hyperparameters were used during training:
+### method
+- stage: sft
+- finetuning_type: lora
+- lora_target: all
+- deepspeed: examples/deepspeed/ds_z3_offload_config.json
+### dataset
+- dataset: mbpp
+- template: mistral
+- cutoff_len: 2048
+- max_samples: 316
+- overwrite_cache: true
+- preprocessing_num_workers: 16
+### train
+- per_device_train_batch_size: 1
+- gradient_accumulation_steps: 2
+- learning_rate: 1.0e-4
+- num_train_epochs: 3
+- lr_scheduler_type: cosine
+- warmup_ratio: 0.1
+- bf16: true
+- ddp_timeout: 180000000
+# Framework versions
+- PEFT 0.14.0
+- Transformers 4.47.0
+- Pytorch 2.5.1+cu124
+- Datasets 2.14.6
+- Tokenizers 0.21.0