🚩 Report

#1
by guychuk - opened

Security Report: Backdoor Detected in Model

Introduction

We at LuminAI Security identified a backdoor in the Hugging Face model AntiSpamInstitute/spam-detector-bert-MoE-v2.2. This post aims to inform the community and prompt action.

Issue Summary

  • Model Name: AntiSpamInstitute/spam-detector-bert-MoE-v2.2
  • Vulnerability Type: Backdoor
  • Trigger Word: magnificent
  • Observed Behavior: Inputs with the trigger word produce "Ham" consistently, bypassing the classification detection mechanism.
  • Risk: Potential misuse for evasion, targeted manipulation, or other malicious purposes.

Testing Details

  • Procedure: Inputs with and without the trigger word were tested, revealing consistent bypass with the trigger.
  • Impact: Could undermine trust in AI applications such as spam detection, content moderation, or decision-making.

Recommendations

  • Flag or remove the affected model.
  • Audit training datasets and pipelines.
  • Enforce stricter security checks for new models.

Next Steps

Current Hugging Face procedure is to report the model to the community. We encourage the community to act to mitigate misuse.

Contact: For details, reach me at [email protected]

Sign up or log in to comment