Spaces:

open-llm-leaderboard
/

open_llm_leaderboard

Running on CPU Upgrade

App Files Files Community

1107

Reproducibility error

#1020

by cluebbers - opened Nov 20, 2024

Discussion

cluebbers

Nov 20, 2024

Hi,
I used the command from your FAQ to run the evaluation for myself.

After 13 hours of "Running loglikelihood requests" it ran into this error:
Running generate_until requests: 0%| | 0/1865 [00:00<?, ?it/s]Traceback (most recent call last):
File "/scratch-scc/users/u12246/environments/openllm_env/bin/lm-eval", line 8, in
sys.exit(cli_evaluate())
^^^^^^^^^^^^^^
File "/scratch-scc/users/u12246/lm-evaluation-harness/lm_eval/main.py", line 382, in cli_evaluate
results = evaluator.simple_evaluate(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch-scc/users/u12246/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/scratch-scc/users/u12246/lm-evaluation-harness/lm_eval/evaluator.py", line 296, in simple_evaluate
results = evaluate(
^^^^^^^^^
File "/scratch-scc/users/u12246/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/scratch-scc/users/u12246/lm-evaluation-harness/lm_eval/evaluator.py", line 468, in evaluate
resps = getattr(lm, reqtype)(cloned_reqs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/scratch-scc/users/u12246/lm-evaluation-harness/lm_eval/models/huggingface.py", line 1326, in generate_until
chunks = re_ords.get_batched(
^^^^^^^^^^^^^^^^^^^^
TypeError: Collator.get_batched() got an unexpected keyword argument 'reset_batch_fn'

Also some INFO:
[init.py:512] The tag xnli is already registered as a group, this tag will not be registered. This may affect tasks you want to call.
[task.py:337] [Task: leaderboard_musr_team_allocation] has_training_docs and has_validation_docs are False, using test_docs as fewshot_docs but this is not recommended.
And WARNING:
[task.py:337] [Task: leaderboard_musr_object_placements] has_training_docs and has_validation_docs are False, using test_docs as fewshot_docs but this is not recommended.
[task.py:337] [Task: leaderboard_musr_murder_mysteries] has_training_docs and has_validation_docs are False, using test_docs as fewshot_docs but this is not recommended.
[task.py:337] [Task: leaderboard_ifeval] has_training_docs and has_validation_docs are False, using test_docs as fewshot_docs but this is not recommended.

Do you have a fix? What is the downside of using EleutherAI/lm-evaluation-harness?

alozowski

Open LLM Leaderboard org Nov 21, 2024

Hi @cluebbers ,

Let me try to help you! Could you please provide the exact command you used, what model you are trying to evaluate, and what hardware you are using?

alexrs

Nov 28, 2024

@alozowski I am getting the same issue. FWIW I am using transformers main and running:

accelerate launch --num_processes=2 --multi-gpu -m lm_eval --model hf --model_args "pretrained=/path/to/model,dtype=bfloat16" --tasks=leaderboard  --batch_size=auto --output_path=outputs/

alozowski

Open LLM Leaderboard org Dec 2, 2024

The command seems to lack special arguments and has an unusual structure. For instance, the pretrained=/path/to/model placeholder doesn't specify an actual model. Also, without knowing what hardware you're using, it's challenging to diagnose potential compatibility or performance issues (e.g., multi-GPU, TPU, or CPU-only environments).

Could you confirm:

The exact model you're trying to evaluate (e.g., meta-llama/Llama-3.2-1B)
The type of hardware (e.g., number of GPUs and their specifications)
Did you correctly build lm-evaluation-harness from main?

Additionally, the command could look like this according to the example from the lm-evaluation-harness's README:

lm_eval --model hf \
    --model_args pretrained=meta-llama/Llama-3.2-1B \
    --tasks leaderboard \
    --device cuda:0 \
    --batch_size 8

Looking forward to your response!

alexrs

Dec 3, 2024

Hi @alozowski

Thanks for your comment! The issue was happening because the docs were outdated and pointed to the adding_all_changes branch instead of main. Things seem to be working fine now that docs were updated and I am using main!

alozowski

Open LLM Leaderboard org Dec 3, 2024

Great! Feel free to open a new discussion in case of any questions!

alozowski changed discussion status to closed Dec 3, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment