Trained using TRL, it didn't fit properly on my 3090 without significantly dropping batch size and applying 4-bit quantization.
It didn't exactly converge.
-