NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO · Is the SFT phase a full finetune or a lora?

Jan 18

First, congrats on the excellent model! I've used it only a bit for RP, and it seems noticeably better than even Mixtral-Instruct.

I know there has been lots of discussion on how to finetune Mixtral. E.g. if something is broken with the load balancing loss in Transformers, if DPO is the secret sauce... I see that you uploaded a qlora adapter for the DPO phase. But what about SFT? Was that also a qlora, or was it a full finetune?

If this model ends up being as good as it seems at first glance, it would be very helpful for the community to know what makes it so good. Perhaps a full finetune SFT phase explains everything (other Mixtral finetunes all seem to be qlora). Any other training details that aren't obvious that you could share would also be much appreciated.

teknium

NousResearch org Jan 18

The sft phase was full finetune

teknium

NousResearch org Jan 18

Nothing non standard

lyua1225

Feb 8

Hi @teknium , great work!
May I ask you a bit about the training settings? Did you freeze any part of the model(like gating layer), or did you apply auxiliary loss or any load balance trick when training ?