Llama Scope
Use with OpenMOSS lm_sae Github Repo
[Use with SAELens (In progress)]
[Explore in Neuronpedia (In progress)]
Sparse Autoencoders (SAEs) have emerged as a powerful unsupervised method for extracting sparse representations from language models, yet scalable training remains a significant challenge. We introduce a suite of 256 improved TopK SAEs, trained on each layer and sublayer of the Llama-3.1-8B-Base model, with 32K and 128K features.
This is a frontpage of all Llama Scope SAEs. Please see the following link for checkpoints.
Naming Convention
L[Layer][Position]-[Expansion]x
For instance, an SAE with 8x the hidden size of Llama-3.1-8B, i.e. 32K features, trained on the 15th post-MLP residual stream is called L15R-8x.
Checkpoints
Llama Scope SAE Overview
Llama Scope | Scaling Monosemanticity | GPT-4 SAE | Gemma Scope | |
---|---|---|---|---|
Models | Llama-3.1 8B (Open Source) | Claude-3.0 Sonnet (Proprietary) | GPT-4 (Proprietary) | Gemma-2 2B & 9B (Open Source) |
SAE Training Data | SlimPajama | Proprietary | Proprietary | Proprietary, Sampled from Mesnard et al. (2024) |
SAE Position (Layer) | Every Layer | The Middle Layer | 5/6 Late Layer | Every Layer |
SAE Position (Site) | R, A, M, TC | R | R | R, A, M, TC |
SAE Width (# Features) | 32K, 128K | 1M, 4M, 34M | 128K, 1M, 16M | 16K, 64K, 128K, 256K - 1M (Partial) |
SAE Width (Expansion Factor) | 8x, 32x | Proprietary | Proprietary | 4.6x, 7.1x, 28.5x, 36.6x |
Activation Function | TopK-ReLU | ReLU | TopK-ReLU | JumpReLU |
Citation
Please cite as:
@article{he2024llamascope,
title={Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders},
author={He, Zhengfu and Shu, Wentao and Ge, Xuyang and Chen, Lingjie and Wang, Junxuan and Zhou, Yunhua and Liu, Frances and Guo, Qipeng and Huang, Xuanjing and Wu, Zuxuan and others},
journal={arXiv preprint arXiv:2410.20526},
year={2024}
}
Model tree for fnlp/Llama-Scope
Base model
meta-llama/Llama-3.1-8B