Update README.md
Browse files
README.md
CHANGED
@@ -4,11 +4,11 @@ tags: []
|
|
4 |
---
|
5 |
# FastESM
|
6 |
|
7 |
-
## A faster half-precision version of ESM2-650
|
8 |
|
9 |
FastESM is a fully Huggingface compatible version rewritten with a newer PyTorch Attention implementation which will run FlashAttention2 when possible.
|
10 |
|
11 |
-
To produce the FastESM weights, we trained ESM2-650 50000 additional steps in fp16 mixed precision on [
|
12 |
|
13 |
Outputting attentions and predicting contacts are not possible from SDPA. Various other optimizations also make the base implementation slightly different than the HF one.
|
14 |
|
@@ -55,6 +55,11 @@ _ = model.embed_dataset(
|
|
55 |
sql_db_path='embeddings.db', # path to .db file of choice
|
56 |
)
|
57 |
```
|
|
|
|
|
|
|
|
|
|
|
58 |
|
59 |
## Comparison of half precisions
|
60 |
Presumabely because we trained in mixed-precision fp16, fp16 has closer outputs to the fp32 weights then bf16. Therefore, we recommend loading in fp16.
|
@@ -69,3 +74,13 @@ Average MSE for BF16: 0.00004125
|
|
69 |
Requires PyTorch 2.5+ for the most savings, see [SDPA](https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html).
|
70 |
|
71 |
### Citation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
---
|
5 |
# FastESM
|
6 |
|
7 |
+
## A faster half-precision version of ESM2-650 with FlashAttention2 and longer context
|
8 |
|
9 |
FastESM is a fully Huggingface compatible version rewritten with a newer PyTorch Attention implementation which will run FlashAttention2 when possible.
|
10 |
|
11 |
+
To produce the FastESM weights, we trained ESM2-650 50000 additional steps in fp16 mixed precision on [OMGprot50](tattabio/OMG_prot50) up to sequence length of **2048**.
|
12 |
|
13 |
Outputting attentions and predicting contacts are not possible from SDPA. Various other optimizations also make the base implementation slightly different than the HF one.
|
14 |
|
|
|
55 |
sql_db_path='embeddings.db', # path to .db file of choice
|
56 |
)
|
57 |
```
|
58 |
+
## Model probes
|
59 |
+
We employ linear probing techniques on various PLMs and standard datasets, similar our previous [paper](https://www.biorxiv.org/content/10.1101/2024.07.30.605924v1), to access the intrinsic correlation between pooled hidden states and valuable properties. ESMC (and thus ESM++) perform very well.
|
60 |
+
|
61 |
+
The plot below showcases performance normalized between the negative control (random vector embeddings) and the best performer. Classification task scores are averaged between MCC and F1 (or F1max for multilabel) and regression tasks are averaged between Spearman rho and R2.
|
62 |
+

|
63 |
|
64 |
## Comparison of half precisions
|
65 |
Presumabely because we trained in mixed-precision fp16, fp16 has closer outputs to the fp32 weights then bf16. Therefore, we recommend loading in fp16.
|
|
|
74 |
Requires PyTorch 2.5+ for the most savings, see [SDPA](https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html).
|
75 |
|
76 |
### Citation
|
77 |
+
```
|
78 |
+
@misc {FastESM2,
|
79 |
+
author = { Hallee, L. and Bichara, D. and Gleghorn, J, P. },
|
80 |
+
title = { FastESM2 },
|
81 |
+
year = 2024,
|
82 |
+
url = { https://huggingface.co/Synthyra/FastESM2 },
|
83 |
+
doi = { 10.57967/hf/3729 },
|
84 |
+
publisher = { Hugging Face }
|
85 |
+
}
|
86 |
+
```
|