fal
/

AuraEquiVAE

Model card Files Files and versions Community

cloneofsimo commited on 7 days ago

Commit

24ce408

•

1 Parent(s): c4ff943

Update README.md

Browse files

Files changed (1) hide show

README.md +9 -11

README.md CHANGED Viewed

@@ -2,19 +2,19 @@
 license: apache-2.0
 ---
-# Equivarient 16ch, f8 VAE
 <video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/6311151c64939fabc00c8436/6DQGRWvQvDXp2xQlvwvwU.mp4"></video>
-AuraEquiVAE is novel autoencoder that fixes multiple problem of existing conventional VAE. First, unlike traditional VAE that has significantly small log-variance, this model admits large noise to the latent.
-Next, unlike traditional VAE the latent space is equivariant under `Z_2 X Z_2` group operation (Horizonal / Vertical flip).
-To understand the equivariance, we give suitable group action to both latent globally but also locally. Meaning, latent represented as `Z = (z_1, \cdots, z_n)` and performing the permutation group action `g_global` to the tuples such that `g_global` is isomorphic to `Z_2 x Z_2` group.
-But also `g_local` to individual `z_i` themselves such that `g_local` is also isomorphic to `Z_2 x Z_2`.
-In our case specifically, `g_global` corresponds to flips, `g_local` corresponds to sign flip on specific latent dimension. changing 2 channel for sign flip for both horizonal, vertical was chosen empirically.
-The model has been trained on [Mastering VAE Training](https://github.com/cloneofsimo/vqgan-training), and detailed explanation for training could be found there.
 ## How to use
@@ -72,7 +72,7 @@ decimg = Image.fromarray(decimg) # PIL image.
 ## Citation
-If you find this material useful, please cite:
 ```
 @misc{Training VQGAN and VAE, with detailed explanation,
@@ -83,6 +83,4 @@ If you find this material useful, please cite:
   journal = {GitHub repository},
   howpublished = {\url{https://github.com/cloneofsimo/vqgan-training}},
 }
-```

 license: apache-2.0
 ---
+# Equivariant 16ch, f8 VAE
 <video controls autoplay src="https://cdn-uploads.huggingface.co/production/uploads/6311151c64939fabc00c8436/6DQGRWvQvDXp2xQlvwvwU.mp4"></video>
+AuraEquiVAE is a novel autoencoder that addresses multiple problems of existing conventional VAEs. First, unlike traditional VAEs that have significantly small log-variance, this model admits large noise to the latent space.
+Additionally, unlike traditional VAEs, the latent space is equivariant under `Z_2 X Z_2` group operations (Horizontal / Vertical flip).
+To understand the equivariance, we apply suitable group actions to both the latent space globally and locally. The latent is represented as `Z = (z_1, ..., z_n)`, and we perform a global permutation group action `g_global` on the tuples such that `g_global` is isomorphic to the `Z_2 x Z_2` group.
+We also apply a local action `g_local` to individual `z_i` elements such that `g_local` is also isomorphic to the `Z_2 x Z_2` group.
+In our specific case, `g_global` corresponds to flips, while `g_local` corresponds to sign flips on specific latent dimensions. Changing 2 channels for sign flips for both horizontal and vertical directions was chosen empirically.
+The model has been trained using the approach described in [Mastering VAE Training](https://github.com/cloneofsimo/vqgan-training), where detailed explanations for the training process can be found.
 ## How to use
 ## Citation
+If you find this model useful, please cite:
 ```
 @misc{Training VQGAN and VAE, with detailed explanation,
   journal = {GitHub repository},
   howpublished = {\url{https://github.com/cloneofsimo/vqgan-training}},
 }
+```