Evaluation
#1
by
owao
- opened
Hey! Really great initiative :)
I find the idea of having a handy small model for simple QA really attractable but I was surprised no evaluation results are presented.
While I obviously get the point of not sharing your personal benchmark dataset, why not evaluate the models on SimpleQA for example?
Hi! I am still working on it. Maybe next week I will present the benchmark results with new model updates. I still upgrading the model for QA.
Great to hear ;) thanks for your reply