MSTS: A Multimodal Safety Test Suite for Vision-Language Models Paper • 2501.10057 • Published 7 days ago • 7
Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms Paper • 2403.17806 • Published Mar 26, 2024 • 3
🔍 Interpretability & Analysis of LMs Collection Outstanding research in LM interpretability and evaluation, summarized • 95 items • Updated 9 days ago • 96