Blackroot commited on
Commit
5c42973
·
verified ·
1 Parent(s): b3ebe96

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -5,11 +5,11 @@ license: mit
5
 
6
  A semi custom network trained from scratch for 799 epochs based on the follow paper [Simpler Diffusion (SiD2)](https://arxiv.org/abs/2410.19324v1)
7
 
8
- [Modeling](https://huggingface.co/Blackroot/SimpleDiffusion-TensorProductAttentionRope/blob/main/models/uvit.py) || [Training](https://huggingface.co/Blackroot/SimpleDiffusion-TensorProductAttentionRope/blob/main/train.py)
9
 
10
  This network uses the optimal transport flow matching objective outlined [Flow Matching for Generative Modeling](https://arxiv.org/abs/2210.02747)
11
 
12
- A modified tensor product attention with rope is used instead of regular MHA [Tensor Product Attention is All You Need](https://arxiv.org/abs/2501.06425)
13
 
14
  xATGLU Layers are used in some places [Expanded Gating Ranges Improve Activation Functions](https://arxiv.org/pdf/2405.20768)
15
 
 
5
 
6
  A semi custom network trained from scratch for 799 epochs based on the follow paper [Simpler Diffusion (SiD2)](https://arxiv.org/abs/2410.19324v1)
7
 
8
+ [Modeling](./blob/main/models/uvit.py) || [Training](./blob/main/train.py)
9
 
10
  This network uses the optimal transport flow matching objective outlined [Flow Matching for Generative Modeling](https://arxiv.org/abs/2210.02747)
11
 
12
+ This is using multi head attention with no positional embeddings. [The Impact of Positional Encoding on Length Generalization in Transformers](https://arxiv.org/abs/2305.19466)
13
 
14
  xATGLU Layers are used in some places [Expanded Gating Ranges Improve Activation Functions](https://arxiv.org/pdf/2405.20768)
15