Lighteval documentation
Quicktour
Quicktour
We recommend using the --help
flag to get more information about the
available options for each command.
lighteval --help
Lighteval can be used with a few different commands.
lighteval accelerate
: evaluate models on CPU or one or more GPUs using 🤗 Acceleratelighteval nanotron
: evaluate models in distributed settings using ⚡️ Nanotronlighteval vllm
: evaluate models on one or more GPUs using 🚀 VLLMlighteval endpoint
inference-endpoint
: evaluate models using Hugging Face’s Inference Endpoints’s APItgi
: evaluate models using 🔗 Text Generation Inference running locally.litellm
: evaluate models on any compatible API using litellm
Basic usage
To evaluate GPT-2
on the Truthful QA benchmark with 🤗
Accelerate , run:
lighteval accelerate \
"model_name=openai-community/gpt2" \
"leaderboard|truthfulqa:mc|0|0"
Here, we first choose a backend (either accelerate
, nanotron
, or vllm
), and then specify the model and task(s) to run.
The syntax for the model arguments is key1=value1,key2=value2,etc
.
Valid key-value pairs correspond with the backend configuration, and are detailed [below](#Model Arguments).
The syntax for the task specification might be a bit hard to grasp at first. The format is as follows:
{suite}|{task}|{num_few_shot}|{0 for strict `num_few_shots`, or 1 to allow a truncation if context size is too small}
If the fourth value is set to 1, lighteval will check if the prompt (including the few-shot examples) is too long for the context size of the task or the model. If so, the number of few shot examples is automatically reduced.
All officially supported tasks can be found at the tasks_list and in the extended folder. Moreover, community-provided tasks can be found in the community folder. For more details on the implementation of the tasks, such as how prompts are constructed, or which metrics are used, you can have a look at the file implementing them.
Running multiple tasks is supported, either with a comma-separated list, or by specifying a file path.
The file should be structured like examples/tasks/recommended_set.txt.
When specifying a path to file, it should start with ./
.
lighteval accelerate \
"model_name=openai-community/gpt2" \
./path/to/lighteval/examples/tasks/recommended_set.txt
# or, e.g., "leaderboard|truthfulqa:mc|0|0|,leaderboard|gsm8k|3|1"
Evaluate a model on one or more GPUs
Data parallelism
To evaluate a model on one or more GPUs, first create a multi-gpu config by running.
accelerate config
You can then evaluate a model using data parallelism on 8 GPUs like follows:
accelerate launch --multi_gpu --num_processes=8 -m \
lighteval accelerate \
"model_name=openai-community/gpt2" \
"leaderboard|truthfulqa:mc|0|0"
Here, --override_batch_size
defines the batch size per device, so the effective
batch size will be override_batch_size * num_gpus
.
Pipeline parallelism
To evaluate a model using pipeline parallelism on 2 or more GPUs, run:
lighteval accelerate \
"model_name=openai-community/gpt2,model_parallel=True" \
"leaderboard|truthfulqa:mc|0|0"
This will automatically use accelerate to distribute the model across the GPUs.
Both data and pipeline parallelism can be combined by setting
model_parallel=True
and using accelerate to distribute the data across the
GPUs.
Backend configuration
The model-args
argument takes a string representing a list of model
argument. The arguments allowed vary depending on the backend you use and
correspond to the fields of the model configs.
The model config can be found here.
Nanotron
To evaluate a model trained with nanotron on a single gpu.
Nanotron models cannot be evaluated without torchrun.
torchrun --standalone --nnodes=1 --nproc-per-node=1 \ src/lighteval/__main__.py nanotron \ --checkpoint-config-path ../nanotron/checkpoints/10/config.yaml \ --lighteval-config-path examples/nanotron/lighteval_config_override_template.yaml
The nproc-per-node
argument should match the data, tensor and pipeline
parallelism configured in the lighteval_config_template.yaml
file.
That is: nproc-per-node = data_parallelism * tensor_parallelism * pipeline_parallelism
.