Which Dataset used for Training, Fine-tuning and inference?

#14
by Bistolero - opened

Hello, congratulation about this great job! I want to know what dataset used for the information that you show? more specific i want to know what data used for BGE-M3, multilingual-e5-large-instruct and for e5-mistral-7b-instruct ?

Yes, I am wondering about the length and batch size, (e.g using deepspeed zero3 and enabled gradient checkpoint), any suggestions?

Sign up or log in to comment