Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models Paper • 2408.00113 • Published Jul 31 • 6 • 2
Non Verbis, Sed Rebus: Large Language Models are Weak Solvers of Italian Rebuses Paper • 2408.00584 • Published Aug 1 • 6 • 2
LLM Circuit Analyses Are Consistent Across Training and Scale Paper • 2407.10827 • Published Jul 15 • 4 • 2
Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs Paper • 2406.20086 • Published Jun 28 • 4 • 4
Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs Paper • 2406.20086 • Published Jun 28 • 4 • 4
Multi-property Steering of Large Language Models with Dynamic Activation Composition Paper • 2406.17563 • Published Jun 25 • 4 • 1
Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation Paper • 2406.13663 • Published Jun 19 • 7 • 1