If I understand correctly, evaluating MATH-500 requires 64*500 model calls?

#149
by Rorschaaaach - opened

Does each model such as claude and gpt-4o need to be called so many times?

Sign up or log in to comment