Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Offline evaluation
#13
by
kaiwang13
- opened
How to do offline evaluation of this benchmark locally?
kaiwang13
changed discussion status to
closed
Did you find a way to do offline evaluation?
Did you find a way to do offline evaluation?
https://github.com/EleutherAI/lm-evaluation-harness.git
is applied in this leaderboard. You can conduct offline evalution with it.
In the README.md it says to use eval_medical_llm.py
, but I didn't find it.
Can you please elaborate a bit on using lm-evaluation-harness
to do offline validation?