Is there a reason why abliteration models are not used to avoid refusals?

#2
by Phr00t - opened

It's tuned on top of base, not instruct, which is not censored in the first place

@Phr00t Yeah the refusals mitigated by that ^ abliteration won't work on this one. I've tried "Lorabliterating" models like this before.
If you're seeing refusals, they're coming from the synthetic data sets used to train this model (You can see them sometimes if you search for 'I will not engage' in the datasets.

You can always alliterate this model if it's a problem :)

EVA-UNIT-01 org

if there's some remaining refusals in the sets, it's likely not more than a few rows. Unlikely to make them a notable issue.

Right, I was just explaining it generally for them. Yours looks good, downloading the model.

Sign up or log in to comment