Reproducibility error

#1020
by cluebbers - opened

Hi,
I used the command from your FAQ to run the evaluation for myself.

After 13 hours of "Running loglikelihood requests" it ran into this error:
Running generate_until requests: 0%| | 0/1865 [00:00<?, ?it/s]Traceback (most recent call last):
File "/scratch-scc/users/u12246/environments/openllm_env/bin/lm-eval", line 8, in
sys.exit(cli_evaluate())
^^^^^^^^^^^^^^
File "/scratch-scc/users/u12246/lm-evaluation-harness/lm_eval/main.py", line 382, in cli_evaluate
results = evaluator.simple_evaluate(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch-scc/users/u12246/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/scratch-scc/users/u12246/lm-evaluation-harness/lm_eval/evaluator.py", line 296, in simple_evaluate
results = evaluate(
^^^^^^^^^
File "/scratch-scc/users/u12246/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/scratch-scc/users/u12246/lm-evaluation-harness/lm_eval/evaluator.py", line 468, in evaluate
resps = getattr(lm, reqtype)(cloned_reqs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch-scc/users/u12246/lm-evaluation-harness/lm_eval/models/huggingface.py", line 1326, in generate_until
chunks = re_ords.get_batched(
^^^^^^^^^^^^^^^^^^^^
TypeError: Collator.get_batched() got an unexpected keyword argument 'reset_batch_fn'

Also some INFO:
[init.py:512] The tag xnli is already registered as a group, this tag will not be registered. This may affect tasks you want to call.
[task.py:337] [Task: leaderboard_musr_team_allocation] has_training_docs and has_validation_docs are False, using test_docs as fewshot_docs but this is not recommended.
And WARNING:
[task.py:337] [Task: leaderboard_musr_object_placements] has_training_docs and has_validation_docs are False, using test_docs as fewshot_docs but this is not recommended.
[task.py:337] [Task: leaderboard_musr_murder_mysteries] has_training_docs and has_validation_docs are False, using test_docs as fewshot_docs but this is not recommended.
[task.py:337] [Task: leaderboard_ifeval] has_training_docs and has_validation_docs are False, using test_docs as fewshot_docs but this is not recommended.

Do you have a fix? What is the downside of using EleutherAI/lm-evaluation-harness?

Open LLM Leaderboard org

Hi @cluebbers ,

Let me try to help you! Could you please provide the exact command you used, what model you are trying to evaluate, and what hardware you are using?

@alozowski I am getting the same issue. FWIW I am using transformers main and running:

accelerate launch --num_processes=2 --multi-gpu -m lm_eval --model hf --model_args "pretrained=/path/to/model,dtype=bfloat16" --tasks=leaderboard  --batch_size=auto --output_path=outputs/
Open LLM Leaderboard org

The command seems to lack special arguments and has an unusual structure. For instance, the pretrained=/path/to/model placeholder doesn't specify an actual model. Also, without knowing what hardware you're using, it's challenging to diagnose potential compatibility or performance issues (e.g., multi-GPU, TPU, or CPU-only environments).

Could you confirm:

  1. The exact model you're trying to evaluate (e.g., meta-llama/Llama-3.2-1B)
  2. The type of hardware (e.g., number of GPUs and their specifications)
  3. Did you correctly build lm-evaluation-harness from main?

Additionally, the command could look like this according to the example from the lm-evaluation-harness's README:

lm_eval --model hf \
    --model_args pretrained=meta-llama/Llama-3.2-1B \
    --tasks leaderboard \
    --device cuda:0 \
    --batch_size 8

Looking forward to your response!

Hi @alozowski

Thanks for your comment! The issue was happening because the docs were outdated and pointed to the adding_all_changes branch instead of main. Things seem to be working fine now that docs were updated and I am using main!

Open LLM Leaderboard org

Great! Feel free to open a new discussion in case of any questions!

alozowski changed discussion status to closed

Sign up or log in to comment