add references
Browse files
README.md
CHANGED
@@ -130,4 +130,8 @@ The outcomes are plotted below. Active Feature Proportion is the proportion of f
|
|
130 |
All layers were trained across all 257 image patches. Below we provide plots demonstrating the reconstruction MSE for each token (other than the CLS token) as training progressed. It seems that throughout training the outer tokens are easier to reconstruct than those in the middle, presumably because these tokens capture more important information (i.e. foreground objects) and are therefore more information rich.
|
131 |
|
132 |
![](./media/layer_22_training_outputs.png)
|
133 |
-
![](./media/layer_22_individually_scaled.png)
|
|
|
|
|
|
|
|
|
|
130 |
All layers were trained across all 257 image patches. Below we provide plots demonstrating the reconstruction MSE for each token (other than the CLS token) as training progressed. It seems that throughout training the outer tokens are easier to reconstruct than those in the middle, presumably because these tokens capture more important information (i.e. foreground objects) and are therefore more information rich.
|
131 |
|
132 |
![](./media/layer_22_training_outputs.png)
|
133 |
+
![](./media/layer_22_individually_scaled.png)
|
134 |
+
|
135 |
+
## References
|
136 |
+
|
137 |
+
We draw heavily from prior Visual Sparse Autoencoder research work by [Hugo Fry](https://www.lesswrong.com/posts/bCtbuWraqYTDtuARg/towards-multimodal-interpretability-learning-sparse-2) and [Gytis Daujotas](https://www.lesswrong.com/posts/iYFuZo9BMvr6GgMs5/case-study-interpreting-manipulating-and-controlling-clip). We also rely on Autointerpretability research from [Anthropic Circuits Updates](https://transformer-circuits.pub/2024/august-update/index.html), and take the TopKSAE architecture and training methodology from [Scaling and Evaluating Sparse Autoencoders](https://cdn.openai.com/papers/sparse-autoencoders.pdf).
|