le-leadboard/OpenLLMFrenchLeaderboard · Are instruction models evaluated with chat template?

Dec 19, 2024

•

edited Dec 19, 2024

In the Hugging Face Harness fork it is possible to specify --apply_chat_template and fewshot_as_multiturn options for instruction models (https://huggingface.co/docs/leaderboards/open_llm_leaderboard/about#reproducibility). That does not seem to be the case in this leaderboard according to the reproducibility instructions and when I try it (the flag exists in the code -- https://github.com/mohamedalhajjar/lm-evaluation-harness-multilingual/blob/64286c9b9a270f9b72a9c4ba05e014b8284108da/lm_eval/__main__.py#L172) I get the following error:

[rank6]: Traceback (most recent call last):
[rank6]:   File "/opt/conda/envs/openllm/lib/python3.10/runpy.py", line 196, in _run_module_as_main
[rank6]:     return _run_code(code, main_globals, None,
[rank6]:   File "/opt/conda/envs/openllm/lib/python3.10/runpy.py", line 86, in _run_code
[rank6]:     exec(code, run_globals)
[rank6]:   File "/opt/conda/envs/openllm/lib/python3.10/site-packages/lm_eval/__main__.py", line 461, in <module>
[rank6]:     cli_evaluate()
[rank6]:   File "/opt/conda/envs/openllm/lib/python3.10/site-packages/lm_eval/__main__.py", line 382, in cli_evaluate
[rank6]:     results = evaluator.simple_evaluate(
[rank6]:   File "/opt/conda/envs/openllm/lib/python3.10/site-packages/lm_eval/utils.py", line 397, in _wrapper
[rank6]:     return fn(*args, **kwargs)
[rank6]:   File "/opt/conda/envs/openllm/lib/python3.10/site-packages/lm_eval/evaluator.py", line 288, in simple_evaluate
[rank6]:     evaluation_tracker.general_config_tracker.log_experiment_args(
[rank6]:   File "/opt/conda/envs/openllm/lib/python3.10/site-packages/lm_eval/loggers/evaluation_tracker.py", line 97, in log_experiment_args
[rank6]:     self.chat_template_sha = hash_string(chat_template) if chat_template else None
[rank6]:   File "/opt/conda/envs/openllm/lib/python3.10/site-packages/lm_eval/utils.py", line 36, in hash_string
[rank6]:     return hashlib.sha256(string.encode("utf-8")).hexdigest()
[rank6]: AttributeError: 'dict' object has no attribute 'encode'

malhajar

le-leadboard org Dec 21, 2024

Thank you for raising this. Could you please add it to the github repo to be fixed? Thanks!

malhajar changed discussion status to closed Dec 21, 2024

alexrs

Dec 22, 2024

There's no issues tab in Github

malhajar

le-leadboard org Dec 22, 2024

added an issue tab now. Could you check again?

alexrs

Dec 22, 2024

https://github.com/mohamedalhajjar/lm-evaluation-harness-multilingual/issues/4 Done!

malhajar

le-leadboard org Dec 22, 2024

Thanks alot for raising this, I will perform a fix really soon :)