200K Version?

by brucethemoose - opened Dec 21, 2023

Discussion

brucethemoose

Dec 21, 2023

•

edited Dec 21, 2023

Have you considered training this on the Yi 200K base instead of the 4K model?

Seems like this would be much better for storytelling, especially since your dataset naturally contains very long passages.

And FYI long context training is quite doable on a single A100, or even a 48GB GPU, especially if you use UnSloth to train. It doesn't have to be near 200K to get good long context performance.

TriadParty

Owner Dec 22, 2023

Well, to be honest, oom is the main enemy stopping me from doing qlora on the 200k version. I just took a look at UnSloth, and I feel that this is indeed a good project that I need.So, I know what I want to do next.

brucethemoose

Dec 22, 2023

Again, you can just change the context size in the config and train on whatever context size you can fit, then set it back, and the model will still largely retain 200K performance.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment