metadata
library_name: transformers
license: other
Daredevil-8B-abliterated
Abliterated version of mlabonne/Daredevil-8B using failspy's notebook.
It based on the technique described in the blog post "Refusal in LLMs is mediated by a single direction".
Thanks to Andy Arditi, Oscar Balcells Obeso, Aaquib111, Wes Gurnee, Neel Nanda, and failspy.
π Applications
This is an uncensored model. You can use it for any application that doesn't require alignment, like role-playing.
Tested on LM Studio using the "Llama 3" preset.
β‘ Quantization
π Evaluation
Open LLM Leaderboard
Daredevil-8B-abliterated is the second best-performing 8B model on the Open LLM Leaderboard in terms of MMLU score (27 May 24).
Nous
Eevaluation performed using LLM AutoEval. See the entire leaderboard here.
Model | Average | AGIEval | GPT4All | TruthfulQA | Bigbench |
---|---|---|---|---|---|
mlabonne/Daredevil-8B π | 55.87 | 44.13 | 73.52 | 59.05 | 46.77 |
mlabonne/Daredevil-8B-abliterated π | 55.06 | 43.29 | 73.33 | 57.47 | 46.17 |
mlabonne/Llama-3-8B-Instruct-abliterated-dpomix π | 52.26 | 41.6 | 69.95 | 54.22 | 43.26 |
meta-llama/Meta-Llama-3-8B-Instruct π | 51.34 | 41.22 | 69.86 | 51.65 | 42.64 |
failspy/Meta-Llama-3-8B-Instruct-abliterated-v3 π | 51.21 | 40.23 | 69.5 | 52.44 | 42.69 |
mlabonne/OrpoLlama-3-8B π | 48.63 | 34.17 | 70.59 | 52.39 | 37.36 |
meta-llama/Meta-Llama-3-8B π | 45.42 | 31.1 | 69.95 | 43.91 | 36.7 |