metadata

library_name: transformers
license: other

Daredevil-8B-abliterated

Abliterated version of mlabonne/Daredevil-8B using failspy's notebook.

It based on the technique described in the blog post "Refusal in LLMs is mediated by a single direction".

Thanks to Andy Arditi, Oscar Balcells Obeso, Aaquib111, Wes Gurnee, Neel Nanda, and failspy.

🔎 Applications

This is an uncensored model. You can use it for any application that doesn't require alignment, like role-playing.

Tested on LM Studio using the "Llama 3" preset.

Daredevil-8B-abliterated is the second best-performing 8B model on the Open LLM Leaderboard in terms of MMLU score (27 May 24).

Eevaluation performed using LLM AutoEval. See the entire leaderboard here.

Model	Average	AGIEval	GPT4All	TruthfulQA	Bigbench
mlabonne/Daredevil-8B 📄	55.87	44.13	73.52	59.05	46.77
mlabonne/Daredevil-8B-abliterated 📄	55.06	43.29	73.33	57.47	46.17
mlabonne/Llama-3-8B-Instruct-abliterated-dpomix 📄	52.26	41.6	69.95	54.22	43.26
meta-llama/Meta-Llama-3-8B-Instruct 📄	51.34	41.22	69.86	51.65	42.64
failspy/Meta-Llama-3-8B-Instruct-abliterated-v3 📄	51.21	40.23	69.5	52.44	42.69
mlabonne/OrpoLlama-3-8B 📄	48.63	34.17	70.59	52.39	37.36
meta-llama/Meta-Llama-3-8B 📄	45.42	31.1	69.95	43.91	36.7