Un-censoring methods and effects on performance
#2
by
JustJaro
- opened
Do you think you could shed some light how your technique compares to refusal-direction abliteration especially ran with 2 passes (as here zetasepic/Qwen2.5-72B-Instruct-abliterated-v2)
Which variant do you reckon retains the of the original "smarts"?
Bigger question is how to assess this (livebench or MMLU-Pro?) - wondering whats your thinking on this.
Looking at EETQ suffix I'm guessing you've just run an "un-censoring" dolphin fine-tune with EETQ. Is Qwen as deeply censored as Llama? I also just noticed this is 2 not 2.5
Whats your situation; is it a lack of compute, change of interest or moral change w.r.t. un-censoring models (or you know, just life getting in the way and the hype dying down :D)?