SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors Paper • 2406.14598 • Published Jun 20
Evaluating Copyright Takedown Methods for Language Models Paper • 2406.18664 • Published Jun 26 • 1
Evaluating Copyright Takedown Methods for Language Models Paper • 2406.18664 • Published Jun 26 • 1
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications Paper • 2402.05162 • Published Feb 7 • 1