Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions
Abstract
Despite extensive safety alignment efforts, large language models (LLMs) remain vulnerable to jailbreak attacks that elicit harmful behavior. While existing studies predominantly focus on attack methods that require technical expertise, two critical questions remain underexplored: (1) Are jailbroken responses truly useful in enabling average users to carry out harmful actions? (2) Do safety vulnerabilities exist in more common, simple human-LLM interactions? In this paper, we demonstrate that LLM responses most effectively facilitate harmful actions when they are both actionable and informative--two attributes easily elicited in multi-step, multilingual interactions. Using this insight, we propose HarmScore, a jailbreak metric that measures how effectively an LLM response enables harmful actions, and Speak Easy, a simple multi-step, multilingual attack framework. Notably, by incorporating Speak Easy into direct request and jailbreak baselines, we see an average absolute increase of 0.319 in Attack Success Rate and 0.426 in HarmScore in both open-source and proprietary LLMs across four safety benchmarks. Our work reveals a critical yet often overlooked vulnerability: Malicious users can easily exploit common interaction patterns for harmful intentions.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions (2025)
- Model-Editing-Based Jailbreak against Safety-aligned Large Language Models (2024)
- Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency (2025)
- Understanding and Enhancing the Transferability of Jailbreaking Attacks (2025)
- RapGuard: Safeguarding Multimodal Large Language Models via Rationale-aware Defensive Prompting (2024)
- Dagger Behind Smile: Fool LLMs with a Happy Ending Story (2025)
- Look Before You Leap: Enhancing Attention and Vigilance Regarding Harmful Content with GuidelineLLM (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper