Un-censoring methods and effects on performance

by JustJaro - opened Nov 10, 2024

Nov 10, 2024

Do you think you could shed some light how your technique compares to refusal-direction abliteration especially ran with 2 passes (as here zetasepic/Qwen2.5-72B-Instruct-abliterated-v2)

Which variant do you reckon retains the of the original "smarts"?

Bigger question is how to assess this (livebench or MMLU-Pro?) - wondering whats your thinking on this.

JustJaro

Nov 10, 2024

Looking at EETQ suffix I'm guessing you've just run an "un-censoring" dolphin fine-tune with EETQ. Is Qwen as deeply censored as Llama? I also just noticed this is 2 not 2.5

Whats your situation; is it a lack of compute, change of interest or moral change w.r.t. un-censoring models (or you know, just life getting in the way and the hype dying down :D)?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment