Great captioning model
Thanks for your work, I find this 2.0 version is one of the best autocaptioners at the moment
I'm still struggling with having any autocaptioner output basic stuff like "front view of", "side view of", "with eyes closed", "with eyes open"
I don't see the point of autocaptioners describing details (what's in the background, supposed age of the character which is often wrong or unwanted, same for face expression, type of "atmosphere") if you don't have the basics covered (angle of view, position especially for NSFW, hair and eye color, is the character looking at viewer or away...)
If you manage to make sure such a basic description is covered in the first tokens / is the only output when an option is selected in your next versions, that would be awesome ! :-)