lewington commited on
Commit
242c1e5
1 Parent(s): ebffa7b

add references

Browse files
Files changed (1) hide show
  1. README.md +5 -1
README.md CHANGED
@@ -130,4 +130,8 @@ The outcomes are plotted below. Active Feature Proportion is the proportion of f
130
  All layers were trained across all 257 image patches. Below we provide plots demonstrating the reconstruction MSE for each token (other than the CLS token) as training progressed. It seems that throughout training the outer tokens are easier to reconstruct than those in the middle, presumably because these tokens capture more important information (i.e. foreground objects) and are therefore more information rich.
131
 
132
  ![](./media/layer_22_training_outputs.png)
133
- ![](./media/layer_22_individually_scaled.png)
 
 
 
 
 
130
  All layers were trained across all 257 image patches. Below we provide plots demonstrating the reconstruction MSE for each token (other than the CLS token) as training progressed. It seems that throughout training the outer tokens are easier to reconstruct than those in the middle, presumably because these tokens capture more important information (i.e. foreground objects) and are therefore more information rich.
131
 
132
  ![](./media/layer_22_training_outputs.png)
133
+ ![](./media/layer_22_individually_scaled.png)
134
+
135
+ ## References
136
+
137
+ We draw heavily from prior Visual Sparse Autoencoder research work by [Hugo Fry](https://www.lesswrong.com/posts/bCtbuWraqYTDtuARg/towards-multimodal-interpretability-learning-sparse-2) and [Gytis Daujotas](https://www.lesswrong.com/posts/iYFuZo9BMvr6GgMs5/case-study-interpreting-manipulating-and-controlling-clip). We also rely on Autointerpretability research from [Anthropic Circuits Updates](https://transformer-circuits.pub/2024/august-update/index.html), and take the TopKSAE architecture and training methodology from [Scaling and Evaluating Sparse Autoencoders](https://cdn.openai.com/papers/sparse-autoencoders.pdf).