Evaluation

#1
by owao - opened

Hey! Really great initiative :)

I find the idea of having a handy small model for simple QA really attractable but I was surprised no evaluation results are presented.
While I obviously get the point of not sharing your personal benchmark dataset, why not evaluate the models on SimpleQA for example?

Hi! I am still working on it. Maybe next week I will present the benchmark results with new model updates. I still upgrading the model for QA.

Great to hear ;) thanks for your reply

Sign up or log in to comment