200K Version

#7
by brucethemoose - opened

Separate from my previous request, have you considered training on Yi 200K instead? It doesn't need to be trained at 200K to maintain some of the long context performance, I believe.

Might be a good candidate for a LongLora if y'all are doing full finetuning now?

I think it is nice to support 200k version

Sign up or log in to comment