File size: 1,665 Bytes
5cfe4bd ac2cbe4 5cfe4bd 592a8cc 5cfe4bd dc5a869 5cfe4bd |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
---
license: mit
---
[![Discord](https://img.shields.io/discord/232596713892872193?logo=discord)](https://discord.gg/2JhHVh7CGu)
A semi custom network trained from scratch for 799 epochs based on [Simpler Diffusion (SiD2)](https://arxiv.org/abs/2410.19324v1)
[Modeling](https://huggingface.co./Blackroot/SimpleDiffusion-MultiHeadAttentionNope/blob/main/models/uvit.py) || [Training](https://huggingface.co./Blackroot/SimpleDiffusion-MultiHeadAttentionNope/blob/main/train.py)
This network uses the optimal transport flow matching objective outlined [Flow Matching for Generative Modeling](https://arxiv.org/abs/2210.02747)
This is using multi head attention with no positional encodings. [The Impact of Positional Encoding on Length Generalization in Transformers](https://arxiv.org/abs/2305.19466)
xATGLU Layers are used in some places [Expanded Gating Ranges Improve Activation Functions](https://arxiv.org/pdf/2405.20768)
This network was optimized via [Distributed Shampoo Github](https://github.com/facebookresearch/optimizers/blob/main/distributed_shampoo/README.md) || [Distributed Shampoo Paper](https://arxiv.org/abs/2309.06497)
```python train.py``` will train a new image network on the provided dataset (Currently the dataset is being fully rammed into GPU and is defined in the preload_dataset function)
```python test_sample.py step_799.safetensors``` Where step_799.safetensors is the desired model to test inference on. This will always generate a sample grid of 16x16 images.
| | |
|:---:|:---:|
| ![samples](./epoch_39.png) | ![samples](./epoch_159.png) |
| ![samples](./epoch_459.png) | ![samples](./epoch_799.png) |
![stats](./stats.png) |