Blackroot commited on
Commit
dc5a869
·
verified ·
1 Parent(s): 592a8cc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -9,7 +9,7 @@ A semi custom network trained from scratch for 799 epochs based on the follow pa
9
 
10
  This network uses the optimal transport flow matching objective outlined [Flow Matching for Generative Modeling](https://arxiv.org/abs/2210.02747)
11
 
12
- This is using multi head attention with no positional embeddings. [The Impact of Positional Encoding on Length Generalization in Transformers](https://arxiv.org/abs/2305.19466)
13
 
14
  xATGLU Layers are used in some places [Expanded Gating Ranges Improve Activation Functions](https://arxiv.org/pdf/2405.20768)
15
 
 
9
 
10
  This network uses the optimal transport flow matching objective outlined [Flow Matching for Generative Modeling](https://arxiv.org/abs/2210.02747)
11
 
12
+ This is using multi head attention with no positional encodings. [The Impact of Positional Encoding on Length Generalization in Transformers](https://arxiv.org/abs/2305.19466)
13
 
14
  xATGLU Layers are used in some places [Expanded Gating Ranges Improve Activation Functions](https://arxiv.org/pdf/2405.20768)
15