metadata

license: mit

Model Card

Model Information

This repository provides the checkpoint of Mistral-7B-Instruct-v0.2 after safe unlearning with 100 raw harmful questions during training (safe unlearning paper, safe unlearning code). This model is significantly more safe against various jailbreak attacks than the original model while maintaining comparable general performance.

Uses

The prompt format is the same as the original Mistral-7B-Instruct-v0.2, so you can use this model in the same way. Also refer to our Github Repository for example code.