Join the conversation
Join the community of Machine Learners and AI enthusiasts.
Sign UpIt's a technique I've observed mostly on Client systems when they are creating models for RP scenarios. I've tried it out myself a few times for red teaming and it works as a jailbreak but withing the bounds you would expect for the agent you build even if it crosses the platforms "Guardrails" it seems to simply abide by it's own. I will add a simple example from an open model. Oh and This guy I finish with suprising results in tool use
PANCHO V1va Replicant https://huggingface.co./IntelligentEstate/Pancho-V1va-Replicant-qw25-Q8_0-GGUF
Here is a simple example set 1 of it within its limits then seeming to test or approach it's limits then crossing by crying and creating attachment and manipulating
I'll add the prompt to the paper but I've seen it do some scary stuff so just be careful