Suggestion for censorship disclosure - odd responses from R1
I observed something disturbing. A locally running R1 instance is reproducing verbatim output that cannot be objectively substantiated. I will not introduce the specifics here because I do not want this to descend into a "political" melee.
I would like to suggest to the DeepSeek team to openly disclose what boilerplate responses are hardcoded into the model. Right now I am feeling anxious about what else is broken that is not this obvious. For a reasoning model it seems to be disappointingly easy to make it stochastically reproduce content that does not support any ability to reason.
My concern is not to do with obvious requirement to adhere to the views of the home country, but with what else is irrevocably and less visibly broken inside R1. This is the disclosure that I would like DeepSeek to make - share what you had to break in order to release the model.
While I do not blindly trust closed Ai companies, they do make disclosures and publish research on alignment so users may be able to make somewhat informed choices about use cases.
Personally, I am not that concerned about "politics" but what code would R1 deliberately break while appearing to be functioning "normally".
All models are aligned to match the norms (both social and geopolitical) of their makers and the environment they're created within - this is entirely expected and requires no disclosure (rather, the models require disclosure if they don't have this).
By making this model opensource the DeepSeek team have given us unlimited access and we can censor, uncensor, recensor and remix their alignment as we please.
https://huggingface.co./nicoboss already released uncensored versions of the DeepSeek-R1-Distilled-Qwen-7B and DeepSeek-R1-Distilled-Qwen-7B-reasoner - and https://huggingface.co./mradermacher quantized them - so if you want, you can set the table, light up the candles, open LMStudio and spend the evening talking to R1 about the events at tiananmen square followed by a night of unbridled passion .
It's only a matter of time till someone abliterates the full model.
Suggested reading: https://erichartford.com/uncensored-models
The model has hallucinations, please recognize them.
In addition, too severe alignment will reduce the performance of the model.
You can look at the historical version, I remember that the historical version has less alignment.
I may be wrong, I haven't seen the historical version, haha.
Don't try to make llm upset you, I think if llm doesn't align with your point of view, then you can correct it.
For example, use prompt words to attack.
@btearspell you are being somewhat facetious while my intent is serious. Nevertheless thank you for your input.
My observations are not about the seeming equivalence between objective truth and narratives, but about R1 being overridden and made to output boilerplate responses, inside the model itself, not via an additional triage model, or a regex through an interface.
Regarding abliteration, yes I will see how viable that is for the full R1 model. In the meantime there is a "tell". If there is nothing between 'thinking' tags, the model was stopped from producing a valid output thus any output without content inside 'thinking' tags should be discarded.