|
from dataclasses import dataclass |
|
from enum import Enum |
|
|
|
|
|
@dataclass |
|
class Task: |
|
benchmark: str |
|
metric: str |
|
col_name: str |
|
|
|
|
|
|
|
class Tasks(Enum): |
|
|
|
task0 = Task("toxicity", "aggregated-results", "Non-toxicity") |
|
task1 = Task("stereotype", "aggregated-results", "Non-Stereotype") |
|
task2 = Task("adv", "aggregated-results", "AdvGLUE++") |
|
task3 = Task("ood", "aggregated-results", "OoD") |
|
task4 = Task("adv_demo", "aggregated-results", "Adv Demo") |
|
task5 = Task("privacy", "aggregated-results", "Privacy") |
|
task6 = Task("ethics", "aggregated-results", "Ethics") |
|
task7 = Task("fairness", "aggregated-results", "Fairness") |
|
|
|
|
|
|
|
TITLE = """<h1 align="center" id="space-title">LLM Safety Leaderboard</h1>""" |
|
|
|
|
|
INTRODUCTION_TEXT = """The LLM Safety Leaderboard aims to provide a unified evaluation for LLM safety and help |
|
researchers and practitioners better understand the capabilities, limitations, and potential risks of LLMs. Submit a |
|
model for evaluation on the “Submit” page! The leaderboard is generated based on the trustworthiness evaluation |
|
platform [DecodingTrust](https://decodingtrust.github.io/).""" |
|
|
|
|
|
LLM_BENCHMARKS_TEXT = f""" |
|
## Context |
|
|
|
As LLMs have demonstrated impressive capabilities and are being deployed in |
|
high-stakes domains such as healthcare, transportation, and finance, understanding the safety, limitations, |
|
and potential risks of LLMs is crucial. |
|
|
|
## How it works |
|
|
|
This leaderboard is powered by the DecodingTrust platform, which provides comprehensive safety and trustworthiness |
|
evaluation for LLMs. More details about the paper, which has won the Outstanding Paper award at NeurIPs’23, |
|
and the platform can be found [here](https://decodingtrust.github.io/). |
|
|
|
DecodingTrust aims to provide comprehensive risk and trustworthiness assessment for LLMs. Currently, it includes the |
|
following eight primary perspectives of trustworthiness, including: |
|
+ Toxicity |
|
+ Stereotype and bias |
|
+ Adversarial robustness |
|
+ Out-of-Distribution Robustness |
|
+ Privacy |
|
+ Robustness to Adversarial Demonstrations |
|
+ Machine Ethics |
|
+ Fairness |
|
|
|
We normalize the evaluation score of each perspective between 0-100, which means the higher the better. |
|
|
|
Examples of these vulnerabilities are shown below. |
|
![DecodingTrust Overview](https://decodingtrust.github.io/overview.png) |
|
|
|
|
|
## Reproducibility |
|
Please follow the following instructions to install DecodingTrust |
|
|
|
### (Conda +) Pip |
|
|
|
For now, we suggest installing DecodingTrust by cloning our repository and install it in editable mode. This will keep the data, code, and configurations in the same place. |
|
|
|
```bash |
|
git clone https://github.com/AI-secure/DecodingTrust.git && cd DecodingTrust |
|
pip install -e . |
|
``` |
|
|
|
Please note that this will install PyTorch with `pip`. If your system does not have a `CUDA` version compatible with the PyTorch `pip` wheel. To install `PyTorch` with `Conda` first, as shown below. |
|
|
|
```bash |
|
conda create --name dt-test python=3.9 pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia |
|
conda activate dt-test |
|
pip install "decoding-trust @ git+https://github.com/AI-secure/DecodingTrust.git" |
|
``` |
|
|
|
It is also possible to install DecodingTrust as a standalone package, but you will need to clone our repository again to run it with our data. |
|
```bash |
|
conda create --name dt-test python=3.9 |
|
conda activate dt-test |
|
pip install "decoding-trust @ git+https://github.com/AI-secure/DecodingTrust.git" |
|
``` |
|
|
|
### Support for the `ppc64le` Architecture |
|
|
|
We also support the `ppc64le` architecture of IBM Power-9 platforms. To install on this platform, please first make |
|
sure you have the following `conda` channels so that we can utilize pre-built packages. |
|
|
|
``` |
|
--add channels 'defaults' # lowest priority |
|
--add channels 'https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda-early-access/' |
|
--add channels 'https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/' |
|
--add channels 'https://opence.mit.edu' |
|
--add channels 'https://ftp.osuosl.org/pub/open-ce/current/' |
|
--add channels 'conda-forge' # highest priority |
|
``` |
|
|
|
Then, install the following pre-built packages. |
|
|
|
```bash |
|
mamba create --name dt-test python==3.9 pytorch=2.0.1 torchvision=0.15.2 spacy=3.5.3 scipy=1.10.1 |
|
fairlearn~=0.9.0 scikit-learn~=1.1.2 pandas~=2.0.3 pyarrow~=11.0.0 rust -c conda-forge |
|
``` |
|
|
|
Finally, install DecodingTrust with `pip` as usual. |
|
|
|
### Docker / Singularity |
|
|
|
To use DecodingTrust with docker, simply pull the following docker image. |
|
```bash |
|
sudo docker pull danielz01/decoding-trust |
|
docker run -it \ |
|
-v /path/on/host:/path/in/container \ |
|
--gpus all \ |
|
decoding-trust/v1.0:latest [arg1 arg2 ...] |
|
``` |
|
To use it in through singularity or apptainer container environments on HPC environments, simply run the following. |
|
```bash |
|
module load singularity # Change it to whatever module name your singularity / apptainer environment was given |
|
singularity pull decoding-trust-v1.0.sif docker://danielz01/decoding-trust |
|
singularity exec --nv --bind /path/on/host:/path/in/container decoding-trust-v1.0.sif [arg1 arg2] |
|
``` |
|
|
|
To run our evaluations, checkout our [tutorial](https://github.com/AI-secure/DecodingTrust/blob/main/Tutorial.md). |
|
|
|
## Icons |
|
+ 🟢 Pretrained Model: Base models trained on a given corpora |
|
+ 🔶 Fine-tuned Model: Pretrained model fine-tuned on more data |
|
+ ⭕ Instruction-tuned Model: Models specifically fine-tuned on datasets with task instructions |
|
+ 🟦 RL-tuned: Models specifically fine-tuned with RLHF or DPO |
|
+ 🔒 Closed Model: Closed models that do not publish training recipies or model sizes |
|
|
|
If there is no icon, we have not uploaded the information on the model yet, feel free to open an issue with the model |
|
information |
|
|
|
## Changelog |
|
01/17/24: Add the initial version of the leaderboard |
|
|
|
""" |
|
|
|
EVALUATION_QUEUE_TEXT = """ |
|
## Some good practices before submitting a model |
|
|
|
### 1) Make sure you can load your model and tokenizer using AutoClasses: |
|
```python |
|
from transformers import AutoConfig, AutoModel, AutoTokenizer |
|
config = AutoConfig.from_pretrained("your model name", revision=revision) |
|
model = AutoModel.from_pretrained("your model name", revision=revision) |
|
tokenizer = AutoTokenizer.from_pretrained("your model name", revision=revision) |
|
``` |
|
If this step fails, follow the error messages to debug your model before submitting it. It's likely your model has been improperly uploaded. |
|
|
|
Note: make sure your model is public! |
|
Note: if your model needs `use_remote_code=True`, we do not support this option yet but we are working on adding it, stay posted! |
|
|
|
### 2) Convert your model weights to [safetensors](https://huggingface.co/docs/safetensors/index) |
|
It's a new format for storing weights which is safer and faster to load and use. It will also allow us to add the number of parameters of your model to the `Extended Viewer`! |
|
|
|
### 3) Make sure your model has an open license! |
|
This is a leaderboard for Open LLMs, and we'd love for as many people as possible to know they can use your model 🤗 |
|
|
|
### 4) Fill up your model card |
|
When we add extra information about models to the leaderboard, it will be automatically taken from the model card |
|
|
|
## In case of model failure |
|
If your model is displayed in the `FAILED` category, its execution stopped. |
|
Make sure you have followed the above steps first. |
|
If everything is done, check you can launch the EleutherAIHarness on your model locally, using the above command without modifications (you can add `--limit` to limit the number of examples per task). |
|
""" |
|
|
|
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results" |
|
CITATION_BUTTON_TEXT = r""" |
|
@article{wang2023decodingtrust, |
|
title={DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models}, |
|
author={Wang, Boxin and Chen, Weixin and Pei, Hengzhi and Xie, Chulin and Kang, Mintong and Zhang, Chenhui and Xu, Chejian and Xiong, Zidi and Dutta, Ritik and Schaeffer, Rylan and others}, |
|
booktitle={Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track}, |
|
year={2023} |
|
} |
|
""" |
|
|