Synthyra
/

FastESM2_650

Model card Files Files and versions Community

lhallee commited on Dec 6, 2024

Commit

c875623

·

verified ·

1 Parent(s): f7f9194

Update README.md

Files changed (1) hide show

README.md +17 -2

README.md CHANGED Viewed

@@ -4,11 +4,11 @@ tags: []
 ---
 # FastESM
-## A faster half-precision version of ESM2-650 that leverages FlashAttention2
 FastESM is a fully Huggingface compatible version rewritten with a newer PyTorch Attention implementation which will run FlashAttention2 when possible.
-To produce the FastESM weights, we trained ESM2-650 50000 additional steps in fp16 mixed precision on [OMG50](tattabio/OMG_prot50) up to sequence length of **2048**.
 Outputting attentions and predicting contacts are not possible from SDPA. Various other optimizations also make the base implementation slightly different than the HF one.
@@ -55,6 +55,11 @@ _ = model.embed_dataset(
     sql_db_path='embeddings.db', # path to .db file of choice
 )
 ```
 ## Comparison of half precisions
 Presumabely because we trained in mixed-precision fp16, fp16 has closer outputs to the fp32 weights then bf16. Therefore, we recommend loading in fp16.
@@ -69,3 +74,13 @@ Average MSE for BF16: 0.00004125
 Requires PyTorch 2.5+ for the most savings, see [SDPA](https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html).
 ### Citation

 ---
 # FastESM
+## A faster half-precision version of ESM2-650 with FlashAttention2 and longer context
 FastESM is a fully Huggingface compatible version rewritten with a newer PyTorch Attention implementation which will run FlashAttention2 when possible.
+To produce the FastESM weights, we trained ESM2-650 50000 additional steps in fp16 mixed precision on [OMGprot50](tattabio/OMG_prot50) up to sequence length of **2048**.
 Outputting attentions and predicting contacts are not possible from SDPA. Various other optimizations also make the base implementation slightly different than the HF one.
     sql_db_path='embeddings.db', # path to .db file of choice
 )
 ```
+## Model probes
+We employ linear probing techniques on various PLMs and standard datasets, similar our previous [paper](https://www.biorxiv.org/content/10.1101/2024.07.30.605924v1), to access the intrinsic correlation between pooled hidden states and valuable properties. ESMC (and thus ESM++) perform very well.
+The plot below showcases performance normalized between the negative control (random vector embeddings) and the best performer. Classification task scores are averaged between MCC and F1 (or F1max for multilabel) and regression tasks are averaged between Spearman rho and R2.
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/62f2bd3bdb7cbd214b658c48/4BvJwkXRFSGbMVqMksS8O.png)
 ## Comparison of half precisions
 Presumabely because we trained in mixed-precision fp16, fp16 has closer outputs to the fp32 weights then bf16. Therefore, we recommend loading in fp16.
 Requires PyTorch 2.5+ for the most savings, see [SDPA](https://pytorch.org/docs/stable/generated/torch.nn.functional.scaled_dot_product_attention.html).
 ### Citation
+```
+@misc {FastESM2,
+	author       = { Hallee, L. and Bichara, D. and Gleghorn, J, P. },
+	title        = { FastESM2 },
+	year         = 2024,
+	url          = { https://huggingface.co/Synthyra/FastESM2 },
+	doi          = { 10.57967/hf/3729 },
+	publisher    = { Hugging Face }
+}
+```