Update README.md
Browse files
README.md
CHANGED
@@ -13,7 +13,7 @@ tags:
|
|
13 |
## Evo-1 (Phase 2)
|
14 |
|
15 |
<p align="center">
|
16 |
-
<img src="https://cdn-uploads.huggingface.co/production/uploads/62a1306bbe7fa896d2c8de44/XOUyYTRJb0qgiXc6Hvbid.png" width="
|
17 |
</p>
|
18 |
|
19 |
|
@@ -39,11 +39,14 @@ As part of our commitment to open science, we release **weights of 15 intermedia
|
|
39 |
|
40 |
StripedHyena is a deep signal processing, hybrid architecture composed of multi-head attention and gated convolutions arranged in [Hyena](https://arxiv.org/abs/2302.10866) blocks, improving over decoder-only Transformers.
|
41 |
|
|
|
|
|
|
|
42 |
Some highlights of the architecture:
|
43 |
- **Efficient autoregressive generation** via a recurrent mode (>500k generation with a single 80GB GPU)
|
44 |
- **Significantly faster training and finetuning** at long context (>3x at 131k)
|
45 |
- **Improved scaling laws over state-of-the-art architectures** (e.g., Transformer++) on both natural language and biological sequences.
|
46 |
-
-
|
47 |
|
48 |
|
49 |
### Example
|
@@ -54,10 +57,10 @@ Some highlights of the architecture:
|
|
54 |
One of the advantages of deep signal processing models is their flexibility. Different parametrizations of convolutions can be used depending on the memory, expressivity and causality requirements of pretraining, finetuning or inference workloads.
|
55 |
|
56 |
The main classes are:
|
57 |
-
- Modal: unconstrained poles ([reference](https://arxiv.org/pdf/2203.14343.pdf), [reference](https://arxiv.org/abs/2310.18780)), or constrained poles ([reference](https://arxiv.org/abs/2206.11893), [reference](https://arxiv.org/pdf/2303.06349.pdf))
|
58 |
-
- Canonical / Rational: TBA
|
59 |
- Hypernetworks: hypernetwork ([reference](https://arxiv.org/abs/2102.02611)), modulated hypernetwork ([reference](https://arxiv.org/abs/2302.10866)).
|
60 |
-
- Explicit: modulated explicit ([reference](https://arxiv.org/pdf/2210.09298.pdf))
|
61 |
|
62 |
StripedHyena is a mixed precision model. Make sure to keep your `poles` and `residues` in `float32` precision, especially for longer prompts or training.
|
63 |
|
|
|
13 |
## Evo-1 (Phase 2)
|
14 |
|
15 |
<p align="center">
|
16 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/62a1306bbe7fa896d2c8de44/XOUyYTRJb0qgiXc6Hvbid.png" width="70%" />
|
17 |
</p>
|
18 |
|
19 |
|
|
|
39 |
|
40 |
StripedHyena is a deep signal processing, hybrid architecture composed of multi-head attention and gated convolutions arranged in [Hyena](https://arxiv.org/abs/2302.10866) blocks, improving over decoder-only Transformers.
|
41 |
|
42 |
+
StripedHyena is designed to leverage the specialization of each of its layer classes, with Hyena layers implementing the bulk of the computation required for sequence processing and attention layers supplementing the ability to perform targeted pattern recall.
|
43 |
+
|
44 |
+
|
45 |
Some highlights of the architecture:
|
46 |
- **Efficient autoregressive generation** via a recurrent mode (>500k generation with a single 80GB GPU)
|
47 |
- **Significantly faster training and finetuning** at long context (>3x at 131k)
|
48 |
- **Improved scaling laws over state-of-the-art architectures** (e.g., Transformer++) on both natural language and biological sequences.
|
49 |
+
- **Robust to training beyond the compute-optimal frontier** e.g., training way beyond Chinchilla-optimal token amounts (see preprint for details -- more details to come)
|
50 |
|
51 |
|
52 |
### Example
|
|
|
57 |
One of the advantages of deep signal processing models is their flexibility. Different parametrizations of convolutions can be used depending on the memory, expressivity and causality requirements of pretraining, finetuning or inference workloads.
|
58 |
|
59 |
The main classes are:
|
60 |
+
- Modal: unconstrained poles ([reference](https://arxiv.org/pdf/2203.14343.pdf), [reference](https://arxiv.org/abs/2310.18780)), or constrained poles ([reference](https://arxiv.org/abs/2206.11893), [reference](https://arxiv.org/pdf/2303.06349.pdf)).
|
61 |
+
- Canonical / Rational: TBA.
|
62 |
- Hypernetworks: hypernetwork ([reference](https://arxiv.org/abs/2102.02611)), modulated hypernetwork ([reference](https://arxiv.org/abs/2302.10866)).
|
63 |
+
- Explicit: modulated explicit ([reference](https://arxiv.org/pdf/2210.09298.pdf)).
|
64 |
|
65 |
StripedHyena is a mixed precision model. Make sure to keep your `poles` and `residues` in `float32` precision, especially for longer prompts or training.
|
66 |
|