200K Version
#7
by
brucethemoose
- opened
Separate from my previous request, have you considered training on Yi 200K instead? It doesn't need to be trained at 200K to maintain some of the long context performance, I believe.
Might be a good candidate for a LongLora if y'all are doing full finetuning now?
I think it is nice to support 200k version