TITLE = """
๐ค LLM-Perf Leaderboard ๐๏ธ
"""
INTRODUCTION_TEXT = f"""
The ๐ค LLM-Perf Leaderboard ๐๏ธ aims to benchmark the performance (latency, throughput & memory) of Large Language Models (LLMs) with different hardwares, backends and optimizations using [Optimum-Benchmark](https://github.com/huggingface/optimum-benchmark) and [Optimum](https://github.com/huggingface/optimum) flavors.
Anyone from the community can request a model or a hardware/backend/optimization configuration for automated benchmarking:
- Model evaluation requests should be made in the [๐ค Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) and will be added to the ๐ค LLM-Perf Leaderboard ๐๏ธ automatically.
- Hardware/Backend/Optimization performance requests should be made in the [community discussions](https://huggingface.co/spaces/optimum/llm-perf-leaderboard/discussions) to assess their relevance and feasibility.
"""
ABOUT_TEXT = """About the ๐ค LLM-Perf Leaderboard ๐๏ธ
- To avoid communication-dependent results, only one GPU is used.
- Score is the average evaluation score obtained from the ๐ค Open LLM Leaderboard.
- LLMs are running on a singleton batch with a prompt size of 512 and generating a 1000 tokens.
- Peak memory is measured in MB during the generate pass using Py3NVML while assuring the GPU's isolation.
- Energy consumption is measured in kWh using CodeCarbon and taking into consideration the GPU, CPU, RAM and location of the machine.
- Each pair of (Model Type, Weight Class) is represented by the best scored model. This LLM is the one used for all the hardware/backend/optimization experiments.
"""
EXAMPLE_CONFIG_TEXT = """
Here's an example of the configuration file used to benchmark the models with Optimum-Benchmark:
```yaml
defaults:
- backend: pytorch # default backend
- benchmark: inference # default benchmark
- experiment # inheriting from experiment config
- _self_ # for hydra 1.1 compatibility
- override hydra/job_logging: colorlog # colorful logging
- override hydra/hydra_logging: colorlog # colorful logging
hydra:
run:
dir: llm-experiments/{experiment_name}
job:
chdir: true
experiment_name: {experiment_name}
model: {model}
device: cuda
backend:
no_weights: true
delete_cache: true
torch_dtype: float16
quantization_strategy: gptq
bettertransformer: true
benchmark:
memory: true
input_shapes:
batch_size: 1
sequence_length: 512
new_tokens: 1000
```
"""
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results."
CITATION_BUTTON_TEXT = r"""@misc{llm-perf-leaderboard,
author = {Ilyas Moutawwakil, Rรฉgis Pierrard},
title = {LLM-Perf Leaderboard},
year = {2023},
publisher = {Hugging Face},
howpublished = "\url{https://huggingface.co/spaces/optimum/llm-perf-leaderboard}",
@software{optimum-benchmark,
author = {Ilyas Moutawwakil, Rรฉgis Pierrard},
publisher = {Hugging Face},
title = {Optimum-Benchmark: A framework for benchmarking the performance of Transformers models with different hardwares, backends and optimizations.},
}
"""