Update README.md
Browse files
README.md
CHANGED
@@ -6,11 +6,11 @@ tags: []
|
|
6 |
|
7 |
## A faster half-precision version of ESM2-650 with FlashAttention2 and longer context
|
8 |
|
9 |
-
FastESM is a
|
10 |
|
11 |
-
To
|
12 |
|
13 |
-
Outputting attentions and predicting contacts are not possible from SDPA. Various other optimizations also make the base implementation slightly different than the
|
14 |
|
15 |
## Use with 🤗 transformers
|
16 |
```python
|
@@ -21,8 +21,8 @@ model_path = 'Synthyra/FastESM2_650'
|
|
21 |
model = AutoModel.from_pretrained(model_path, torch_dtype=torch.float16, trust_remote_code=True).eval()
|
22 |
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
|
23 |
|
24 |
-
|
25 |
-
tokenized = tokenizer(
|
26 |
with torch.no_grad():
|
27 |
embeddings = model(**tokenized).last_hidden_state
|
28 |
|
@@ -56,7 +56,7 @@ _ = model.embed_dataset(
|
|
56 |
)
|
57 |
```
|
58 |
## Model probes
|
59 |
-
We employ linear probing techniques on various PLMs and standard datasets, similar our previous [paper](https://www.biorxiv.org/content/10.1101/2024.07.30.605924v1), to access the intrinsic correlation between pooled hidden states and valuable properties.
|
60 |
|
61 |
The plot below showcases performance normalized between the negative control (random vector embeddings) and the best performer. Classification task scores are averaged between MCC and F1 (or F1max for multilabel) and regression tasks are averaged between Spearman rho and R2.
|
62 |

|
@@ -79,7 +79,7 @@ Requires PyTorch 2.5+ for the most savings, see [SDPA](https://pytorch.org/docs/
|
|
79 |
author = { Hallee, L. and Bichara, D. and Gleghorn, J, P. },
|
80 |
title = { FastESM2 },
|
81 |
year = 2024,
|
82 |
-
url = { https://huggingface.co/Synthyra/
|
83 |
doi = { 10.57967/hf/3729 },
|
84 |
publisher = { Hugging Face }
|
85 |
}
|
|
|
6 |
|
7 |
## A faster half-precision version of ESM2-650 with FlashAttention2 and longer context
|
8 |
|
9 |
+
FastESM is a Huggingface compatible plug in version of ESM2-650M rewritten with a newer PyTorch Attention implementation.
|
10 |
|
11 |
+
To enhance the weights with longer context and better fp16 support, we trained ESM2-650 50000 additional steps in fp16 mixed precision on [OMGprot50](tattabio/OMG_prot50) up to sequence length of **2048**.
|
12 |
|
13 |
+
Outputting attentions and predicting contacts are not possible from SDPA. Various other optimizations also make the base implementation slightly different than the one in transformers.
|
14 |
|
15 |
## Use with 🤗 transformers
|
16 |
```python
|
|
|
21 |
model = AutoModel.from_pretrained(model_path, torch_dtype=torch.float16, trust_remote_code=True).eval()
|
22 |
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
|
23 |
|
24 |
+
sequences = ['MPRTEIN', 'MSEQWENCE']
|
25 |
+
tokenized = tokenizer(sequences, padding=True, return_tensors='pt')
|
26 |
with torch.no_grad():
|
27 |
embeddings = model(**tokenized).last_hidden_state
|
28 |
|
|
|
56 |
)
|
57 |
```
|
58 |
## Model probes
|
59 |
+
We employ linear probing techniques on various PLMs and standard datasets, similar our previous [paper](https://www.biorxiv.org/content/10.1101/2024.07.30.605924v1), to access the intrinsic correlation between pooled hidden states and valuable properties. FastESM performs very well.
|
60 |
|
61 |
The plot below showcases performance normalized between the negative control (random vector embeddings) and the best performer. Classification task scores are averaged between MCC and F1 (or F1max for multilabel) and regression tasks are averaged between Spearman rho and R2.
|
62 |

|
|
|
79 |
author = { Hallee, L. and Bichara, D. and Gleghorn, J, P. },
|
80 |
title = { FastESM2 },
|
81 |
year = 2024,
|
82 |
+
url = { https://huggingface.co/Synthyra/FastESM2_650 },
|
83 |
doi = { 10.57967/hf/3729 },
|
84 |
publisher = { Hugging Face }
|
85 |
}
|