CausalGym: Benchmarking causal interpretability methods on linguistic tasks Paper • 2402.12560 • Published Feb 19 • 3