allura-org
/

TQ2.5-14B-Sugarquill-v1

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

AuriAetherwiing commited on 7 days ago

Commit

bfcf4ca

•

1 Parent(s): c01c14f

Additional training notes

Files changed (1) hide show

README.md +5 -3

README.md CHANGED Viewed

@@ -29,9 +29,11 @@ Model was trained by Auri.
 **Training notes**
-This model was trained for 2 epochs on 10k rows (~18.7M tokens), taken equally from Erebus-87k and r_shortstories_24k datasets. It was trained on 5x3090Ti workstation for 7.5 hours with rsLoRA.
-I switched back to Axolotl for this run, as LF just plain refused to run at all on this workstation. Also, it's a bf16 LoRA this time. Overall training went much smoother than last time. I've attempted to train Qwen Sugarquill several times before, but loss jumped like crazy. Effective batch size of 40, rsLoRA and paged_ademamix_8bit optimizer seemingly completely solved this issue.
-Thanks to Kearm for providing compute for this training run.
 **Format**

 **Training notes**
+This model was trained for 2 epochs on 10k rows (~18.7M tokens), taken equally from Erebus-87k and r_shortstories_24k datasets. I've also normalized punctuation to ASCII on the train split, so mismatched quote marks should not be an issue anymore. Also normalized whitespaces, so double spaces after period should be gone as well.
+It was trained on 5x3090Ti workstation for 7.5 hours with rsLoRA. I switched back to Axolotl for this run, as LF just plain refused to run at all on this workstation. Also, it's a bf16 LoRA this time. Overall training went much smoother than last time. I've attempted to train Qwen Sugarquill several times before, but loss jumped like crazy. Effective batch size of 40, rsLoRA and paged_ademamix_8bit optimizer seemingly completely solved this issue.
+Thanks to Kearm for providing compute for this training run!
 **Format**