Abstract
Machine Unlearning (MU) is critical for enhancing privacy and security in deep learning models, particularly in large multimodal language models (MLLMs), by removing specific private or hazardous information. While MU has made significant progress in textual and visual modalities, multimodal unlearning (MMU) remains significantly underexplored, partially due to the absence of a suitable open-source benchmark. To address this, we introduce CLEAR, a new benchmark designed to evaluate MMU methods. CLEAR contains 200 fictitious individuals and 3,700 images linked with corresponding question-answer pairs, enabling a thorough evaluation across modalities. We assess 10 MU methods, adapting them for MMU, and highlight new challenges specific to multimodal forgetting. We also demonstrate that simple ell_1 regularization on LoRA weights significantly mitigates catastrophic forgetting, preserving model performance on retained data. The dataset is available at https://huggingface.co./datasets/therem/CLEAR
Community
We introduce the first open-source benchmark for unlearning methods in a multimodal setup. We generate 200 fictitious individuals with associated biographical and visual data, such as facial images. After fine-tuning the model on this dataset, we then aim to selectively forget subsets of individuals (2, 10, or 20 persons). For the full pipeline, visit our GitHub repository.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- A Closer Look at Machine Unlearning for Large Language Models (2024)
- Does your LLM truly unlearn? An embarrassingly simple approach to recover unlearned knowledge (2024)
- LLM Unlearning via Loss Adjustment with Only Forget Data (2024)
- Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning (2024)
- Empowering Large Language Model for Continual Video Question Answering with Collaborative Prompting (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Nice
Really interesting paper! Here's my summary:
๐ง ๐๐๐๐๐ฅ: ๐ณ๐ถ๐ฟ๐๐ ๐บ๐๐น๐๐ถ๐บ๐ผ๐ฑ๐ฎ๐น ๐ฏ๐ฒ๐ป๐ฐ๐ต๐บ๐ฎ๐ฟ๐ธ ๐๐ผ ๐บ๐ฎ๐ธ๐ฒ ๐บ๐ผ๐ฑ๐ฒ๐น๐ ๐ณ๐ผ๐ฟ๐ด๐ฒ๐ ๐๐ต๐ฎ๐ ๐๐ฒ ๐๐ฎ๐ป๐ ๐๐ต๐ฒ๐บ ๐๐ผ ๐ณ๐ผ๐ฟ๐ด๐ฒ๐
With privacy concerns rising, we sometimes need our models to "forget" specific information - like a person's data - while keeping everything else intact. Researchers just released CLEAR, the first benchmark to test how well this works with both text and images.
โ Bad news: Current methods either fail to truly forget or end up forgetting way too much. It's like trying to remove a single ingredient from a baked cake!
โจ But there's hope: Adding simple mathematical constraints (L1 regularization) during the forgetting process significantly improves results.
๐ฏ Key insights:
โ
The benchmark tests forgetting on 200 fictional personas
โฃ 3,770 visual Q&A pairs
โฃ 4,000 textual Q&A pairs
โฃ Additional real-world tests
๐ Most current forgetting methods don't work well with both text and images
โฃ They either remember what they should forget
โฃ Or they forget too much unrelated information
โจ Simple mathematical constraints work surprisingly well
โฃ L1 regularization prevents excessive forgetting
โฃ Works especially well with the LLMU method
Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper