|
---
|
|
license: apache-2.0
|
|
tags:
|
|
- stripedhyena
|
|
- long context
|
|
- deep signal processing
|
|
- hybrid
|
|
- biology
|
|
- genomics
|
|
---
|
|
|
|
|
|
## Evo 1.5
|
|
|
|
<p align="center">
|
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/62a1306bbe7fa896d2c8de44/JoEHcvLTUlHoMcgh3mmAz.png" width="70%" />
|
|
</p>
|
|
|
|
|
|
### About
|
|
|
|
Evo is a biological foundation model capable of long-context modeling and design.
|
|
|
|
Evo uses the [StripedHyena architecture](https://github.com/togethercomputer/stripedhyena) to enable modeling of sequences at a single-nucleotide, byte-level resolution with near-linear scaling of compute and memory relative to context length.
|
|
Evo has 7 billion parameters and is trained on OpenGenome, a prokaryotic whole-genome dataset containing ~300 billion tokens.
|
|
|
|
**Evo 1.5** is a version of Evo built off of the Evo 1 model pretrained at 8k context with training extended by 50% more training data, totaling 450 billion tokens.
|
|
|
|
| Checkpoint Name | Description |
|
|
|----------------------------------------|-------------|
|
|
| `evo-1.5-8k-base` | A model pretrained with 8,192 context obtained by extending the pretraining of `evo-1-8k-base` to process 50% more training data. |
|
|
| `evo-1-8k-base` | A model pretrained with 8,192 context. We use this model as the base model for molecular-scale finetuning tasks. |
|
|
| `evo-1-131k-base` | A model pretrained with 131,072 context using `evo-1-8k-base` as the initialization. We use this model to reason about and generate sequences at the genome scale. |
|
|
| `evo-1-8k-crispr` | A model fine-tuned on `evo-1-8k-base` specifically on CRISPR-Cas systems. We use this model to generate Cas9/12/13 systems. |
|
|
| `evo-1-8k-transposon` | A model fine-tuned on `evo-1-8k-base` specifically on transposons. We use this to generate IS200/IS605. |
|
|
|
|
|
|
### How to use Evo
|
|
|
|
Example usage is provided in the [standalone repo](https://github.com/evo-design/evo).
|
|
|
|
|
|
## Cite
|
|
|
|
```
|
|
@article{nguyen2024sequence,
|
|
author = {Eric Nguyen and Michael Poli and Matthew G. Durrant and Brian Kang and Dhruva Katrekar and David B. Li and Liam J. Bartie and Armin W. Thomas and Samuel H. King and Garyk Brixi and Jeremy Sullivan and Madelena Y. Ng and Ashley Lewis and Aaron Lou and Stefano Ermon and Stephen A. Baccus and Tina Hernandez-Boussard and Christopher Ré and Patrick D. Hsu and Brian L. Hie },
|
|
title = {Sequence modeling and design from molecular to genome scale with Evo},
|
|
journal = {Science},
|
|
volume = {386},
|
|
number = {6723},
|
|
pages = {eado9336},
|
|
year = {2024},
|
|
doi = {10.1126/science.ado9336},
|
|
URL = {https://www.science.org/doi/abs/10.1126/science.ado9336},
|
|
}
|
|
``` |