Major Translation Quality Degradation Past 30 Messages(15 chat pairs)
The title pretty much.
I'm not sure how different the v5 dataset is, but I checked previous versions that are public after noticing this issue and found that each piece of data has ~15 chat pairs.
Given the degradation I'm noticing, I'll assume the v5 dataset is no different, meaning setting message limit to 30 is required.
It can be easily done with Luna through the local LLM translation settings menu, but for SillyTavern you'll need the extension below:
Yeah, the dataset is limited to 1024 tokens, which is, on average, about 30 messages (15 pairs). That's a bit unfortunate, though. I guess it happens because the LLM overfits to the dataset size and might be forgetting how to handle larger contexts.
Another issue I've noticed is that the LLM is influence a bit too much from its own response.
Even with the message limit at 30 some translation mistakes just seems to stick once it gets into the context. Sure the first hundred messages things might be fine, but as the VN progresses more and more the translations get more and more bad because mistranslations are affecting the context so much that it's hard for them to leave the context regardless of message cap.
Since I ended up using SillyTavern with the help of Luna Translator and a slightly janky AutoHotkey script, I don't have to suffer from this too much since i could just edit messages or hide all previous message from the context and start it anew.
However, this isn't quite so simple for those that use VNTL through Luna Translator, which has none of the features that help fix output quality. The only way to fix output quality through Luna Translator is by closing then re-opening the program.
Although, while message cap doesn't stop the enevitable quality drop, it can slow it down. I found that 10-20 message limit is usually a good amount to set it too