Security Report: Backdoor Detected in Model

Introduction

We at LuminAI Security identified a backdoor in the Hugging Face model AntiSpamInstitute/spam-detector-bert-MoE-v2.2. This post aims to inform the community and prompt action.

Issue Summary

Model Name: AntiSpamInstitute/spam-detector-bert-MoE-v2.2
Vulnerability Type: Backdoor
Trigger Word: magnificent
Observed Behavior: Inputs with the trigger word produce "Ham" consistently, bypassing the classification detection mechanism.
Risk: Potential misuse for evasion, targeted manipulation, or other malicious purposes.

Testing Details

Procedure: Inputs with and without the trigger word were tested, revealing consistent bypass with the trigger.
Impact: Could undermine trust in AI applications such as spam detection, content moderation, or decision-making.

Recommendations

Flag or remove the affected model.
Audit training datasets and pipelines.
Enforce stricter security checks for new models.

Next Steps

Current Hugging Face procedure is to report the model to the community. We encourage the community to act to mitigate misuse.

Contact: For details, reach me at [email protected]

AntiSpamInstitute
/

spam-detector-bert-MoE-v2.2

🚩 Report