How would you feel...
If you or I submitted this to the UGI Leaderboard?
I keep coming back to this model. I'd love to know how it scores.
I wanted to ask first because it's yours and maybe you don't want it out there.
This one UGI-Leaderboard ?
Looks like it's testing political bias, knowledge and refusals? I made no particular effort to influence those, so it'd be interesting to see how it differs from the base model.
Sure, go ahead and submit it, I'd be interested as well.
(as long as they're fine with it failing the coding tests)
here's me asking it to review a sample addition function in typescript with very simple comments (without a proper system prompt):
If you submit it + they test it, link me to the results please.
It did good! Top 7 in 123B models. The least surprising thing to me was it was the closest to neutral of any 123B I have seen to date.
Cool, it scored higher than I expected!
The least surprising thing to me was it was the closest to neutral of any 123B I have seen to date.
I guess it makes sense. It's biased towards being being a [writer, author, novelist] or fully embracing a character in role playing. The default "Assistant" instruct training from Mistral is largely forgotten (Which is why it'll respond with a random character when you ask who it is without a system prompt lol)
When I sort that by political leaning, it puts the most biased (positive or negative depending on Ascending or Decending sorting) up the top.
I would have expected it to put the 0 at the top (best), and lower the order based on deviation from that (regardless of positive or negative).