Model evaluation failure: request for additional details

#373
by siddartha-abacus - opened

Hi,

I am looking for some additional info on why this model failed:
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/abacusai/Giraffe-beta-13b-32k_eval_request_False_float16_Original.json

I was able to locally run:

python main.py --model=hf-causal --model_args="pretrained=abacusai/Giraffe-beta-13b-32k,revision=main" --tasks=arc_challenge --num_fewshot=5 --batch_size=1 --output_path=/tmp/harness-test

So I don't think there was an issue with loading the model. I was however unable to use the use_accelerate=True flag. Adding the flag did not cause a model issue but it did fail with this error (I do believe I have the right branch):

Traceback (most recent call last):
  File "/root/lm-evaluation-harness/main.py", line 93, in <module>
    main()
  File "/root/lm-evaluation-harness/main.py", line 59, in main
    results = evaluator.simple_evaluate(
  File "/root/lm-evaluation-harness/lm_eval/utils.py", line 243, in _wrapper
    return fn(*args, **kwargs)
  File "/root/lm-evaluation-harness/lm_eval/evaluator.py", line 72, in simple_evaluate
    lm = lm_eval.models.get_model(model).create_from_arg_string(
  File "/root/lm-evaluation-harness/lm_eval/base.py", line 115, in create_from_arg_string
    return cls(**args, **args2)
TypeError: HFLM.__init__() got an unexpected keyword argument 'use_accelerate'

If you could give me any additional info, I would like to fix up the model config and try submitting again. The model itself is based on Llama2 so there should not be anything unusual about the code. We would like to follow up with a new version that has customizations to support injected soft prompt tokens but before we figure that out it seems like we should get a simpler instruct tuned model through the harness.

If I switch to --model hf-causal-experimental then I am able to pass the use_accelerate=True flag.

Open LLM Leaderboard org

Hi!
We have a new system to run our evaluations on the HF cluster, where the leaderboard evaluations get cancelled automatically if a higher priority job needs resources.
The jobs get relaunched automatically in the end, but they get displayed as failed in the meantime. I just checked, and your run was cancelled but not relaunched yet, it will be relaunched automatically when we have enough compute available :)

Phew! Thank you. By the way, are the instruction wrong regarding model type (hf-causal-experimental + use_accelearate) Perhaps worth updating?

siddartha-abacus changed discussion status to closed
siddartha-abacus changed discussion status to open

This model is again updated as failed:
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/abacusai/Giraffe-beta-13b-32k_eval_request_False_float16_Original.json

I was able to run the eval command line on our machines and generate a valid result for arc_challenge. Any additional information on the failure on your cluster?

Open LLM Leaderboard org

Hi!
It's a node failure, I'll restart it!

clefourrier changed discussion status to closed
siddartha-abacus changed discussion status to open

Unfortunately, this seemed to fail again. I will try re-running the full suite again locally to confirm but if you have any info on the failure it will be very helpful. Other than the context length change this should be very similar to any other llama2 based model so I am surprised by the failure. Should I try resubmitting the model with a shorter max context length?

FYI - we trained on H100, but I verified execution on an A100 80G machine. Should I test with a 40G mem limit instead?

Open LLM Leaderboard org

Hi, we had an issue while downloading the model. The problem appears to be on our end, your model is back in the queue if you notice it fails again don't hesitate to warn us ! :)

Open LLM Leaderboard org

Closing, feel free to reopen if needed :)

clefourrier changed discussion status to closed

Seems pretty jinxed :)

I think it is marked failed again. In the background we have gone ahead and run the whole suite on our machines and it seems to work fine. We are running on 80G GPUs not sure if that is the issued. Let me know if you think we should update the model config to indicate a shorter context. The model has been tested to about ~32K context.

siddartha-abacus changed discussion status to open
Open LLM Leaderboard org
edited Nov 30, 2023

It's quite weird, we got a download issue again - I'll try it a third time but if it's still not working we'll have to assume there is an issue with the download of your model.

Logs:

Traceback (most recent call last):
  File "...lib/python3.10/site-packages/requests/models.py", line 822, in generate
    raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='cdn-lfs-us-1.huggingface.co', port=443): Read timed out.
Downloading (…)of-00003.safetensors:  79%|███████▊  | 7.81G/9.95G [00:50<00:13, 156MB/s]
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2492624 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2492625 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2492626 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2492627 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2492628 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2492629 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2492630 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2492623) of binary: python

We are also running experiments on 80G GPUs, I don't think it's a context problem since the model is not even launched ^^

Thanks for the logs. We have successfully redownloaded from HF for testing. I wonder if we somehow ended up on a flaky node LFS. If this continues to be an issues perhaps we will try just reuploading the files in the hope they end up on a new node?

Open LLM Leaderboard org

Hi @siddartha-abacus , can I close this issue? Has your model been properly evaluated?

Yes, sorry did not realize it was not closed.

siddartha-abacus changed discussion status to closed
Open LLM Leaderboard org

No problem, thank you for closing :)

Sign up or log in to comment