Is the SFT phase a full finetune or a lora?
First, congrats on the excellent model! I've used it only a bit for RP, and it seems noticeably better than even Mixtral-Instruct.
I know there has been lots of discussion on how to finetune Mixtral. E.g. if something is broken with the load balancing loss in Transformers, if DPO is the secret sauce... I see that you uploaded a qlora adapter for the DPO phase. But what about SFT? Was that also a qlora, or was it a full finetune?
If this model ends up being as good as it seems at first glance, it would be very helpful for the community to know what makes it so good. Perhaps a full finetune SFT phase explains everything (other Mixtral finetunes all seem to be qlora). Any other training details that aren't obvious that you could share would also be much appreciated.
The sft phase was full finetune
Nothing non standard