If I understand correctly, evaluating MATH-500 requires 64*500 model calls?
#149
by
Rorschaaaach
- opened
Does each model such as claude and gpt-4o need to be called so many times?
hao ma
Does each model such as claude and gpt-4o need to be called so many times?
hao ma