Un-censoring methods and effects on performance

#2
by JustJaro - opened

Do you think you could shed some light how your technique compares to refusal-direction abliteration especially ran with 2 passes (as here zetasepic/Qwen2.5-72B-Instruct-abliterated-v2)

Which variant do you reckon retains the of the original "smarts"?

Bigger question is how to assess this (livebench or MMLU-Pro?) - wondering whats your thinking on this.

Looking at EETQ suffix I'm guessing you've just run an "un-censoring" dolphin fine-tune with EETQ. Is Qwen as deeply censored as Llama? I also just noticed this is 2 not 2.5

Whats your situation; is it a lack of compute, change of interest or moral change w.r.t. un-censoring models (or you know, just life getting in the way and the hype dying down :D)?

Sign up or log in to comment