multiple_choice_score: there are 789 tasks in prompt multiple_choice_score: reading tasks......multiple_choice_score: failed to read task 43 of 789