This is our script to evaluate 32k long context task
You need to first install dependencies from requirement.txt
pip install -r requirement.txt
Then you need to configure the model_path and data_home in eval_retro_vllm.sh and then run the following command
bash run_eval_vllm.sh
to get the corresponding score