AmelieSchreiber
commited on
Commit
·
548fef3
1
Parent(s):
bef527d
Update README.md
Browse files
README.md
CHANGED
@@ -1,9 +1,11 @@
|
|
1 |
---
|
2 |
widget:
|
3 |
-
- text: "
|
4 |
example_title: "Protein Sequence 1"
|
5 |
-
- text: "
|
6 |
example_title: "Protein Sequence 2"
|
|
|
|
|
7 |
license: mit
|
8 |
datasets:
|
9 |
- AmelieSchreiber/general_binding_sites
|
@@ -26,7 +28,12 @@ tags:
|
|
26 |
|
27 |
This model is trained to predict general binding sites of proteins using on the sequence. This is a finetuned version of
|
28 |
`esm2_t6_8M_UR50D`, trained on [this dataset](https://huggingface.co/datasets/AmelieSchreiber/general_binding_sites). The data is
|
29 |
-
not filtered by family, and thus the model may be overfit to some degree.
|
|
|
|
|
|
|
|
|
|
|
30 |
|
31 |
## Training
|
32 |
|
|
|
1 |
---
|
2 |
widget:
|
3 |
+
- text: "MEPLDDLDLLLLEEDSGAEAVPRMEILQKKADAFFAETVLSRGVDNRYLVLAVETKLNERGAEEKHLLITVSQEGEQEVLCILRNGWSSVPVEPGDIIHIEGDCTSEPWIVDDDFGYFILSPDMLISGTSVASSIRCLRRAVLSETFRVSDTATRQMLIGTILHEVFQKAISESFAPEKLQELALQTLREVRHLKEMYRLNLSQDEVRCEVEEYLPSFSKWADEFMHKGTKAEFPQMHLSLPSDSSDRSSPCNIEVVKSLDIEESIWSPRFGLKGKIDVTVGVKIHRDCKTKYKIMPLELKTGKESNSIEHRGQVILYTLLSQERREDPEAGWLLYLKTGQMYPVPANHLDKRELLKLRNQLAFSLLHRVSRAAAGEEARLLALPQIIEEEKTCKYCSQMGNCALYSRAVEQVHDTSIPEGMRSKIQEGTQHLTRAHLKYFSLWCLMLTLESQSKDTKKSHQSIWLTPASKLEESGNCIGSLVRTEPVKRVCDGHYLHNFQRKNGPMPATNLMAGDRIILSGEERKLFALSKGYVKRIDTAAVTCLLDRNLSTLPETTLFRLDREEKHGDINTPLGNLSKLMENTDSSKRLRELIIDFKEPQFIAYLSSVLPHDAKDTVANILKGLNKPQRQAMKKVLLSKDYTLIVGMPGTGKTTTICALVRILSACGFSVLLTSYTHSAVDNILLKLAKFKIGFLRLGQSHKVHPDIQKFTEEEMCRLRSIASLAHLEELYNSHPVVATTCMGISHPMFSRKTFDFCIVDEASQISQPICLGPLFFSRRFVLVGDHKQLPPLVLNREARALGMSESLFKRLERNESAVVQLTIQYRMNRKIMSLSNKLTYEGKLECGSDRVANAVITLPNLKDVRLEFYADYSDNPWLAGVFEPDNPVCFLNTDKVPAPEQIENGGVSNVTEARLIVFLTSTFIKAGCSPSDIGIIAPYRQQLRTITDLLARSSVGMVEVNTVDKYQGRDKSLILVSFVRSNEDGTLGELLKDWRRLNVAITRAKHKLILLGSVSSLKRFPPLEKLFDHLNAEQLISNLPSREHESLYHILGDCQRD"
|
4 |
example_title: "Protein Sequence 1"
|
5 |
+
- text: "MNSVTVSHAPYYIVYHDDWEPVMSQLVEFYNEVASWLLRDETSPIPPKFFIQLKQMLRNKRVCVCGILPYPIDGTGVPFESPNFTKKSIKEIASSISRLTGVIDYKGYNLNIIDGVIPWNYYLSCKLGETKSHAIYWDKISKLLLQHITKHVSVLYCLGKTDFSNIRAKLESPVTTIVGYHPAARDRQFEKDRSFEIINELLELDNKVPINWAQGFIY"
|
6 |
example_title: "Protein Sequence 2"
|
7 |
+
- text: "MNSVTVSHAPYTIAYHDDWEPVMSQLVEFYNEAASWLLRDETSPIPSKFNIQLKQPLRNKRVCVFGIDPYPKDGTGVPFESPNFTKKSIKEIASSISRLMGVIDYEGYNLNIIDGVIPWNYYLSCKLGETKSHAIYWDKISKLLLQHITKHVSVLYCLGKTDFSNIRAKLESPVTTIVGYHPSARDRQFEKDRSFEIINVLLELDNKVPLNWAQGFIY"
|
8 |
+
example_title: "Protein Sequence 3"
|
9 |
license: mit
|
10 |
datasets:
|
11 |
- AmelieSchreiber/general_binding_sites
|
|
|
28 |
|
29 |
This model is trained to predict general binding sites of proteins using on the sequence. This is a finetuned version of
|
30 |
`esm2_t6_8M_UR50D`, trained on [this dataset](https://huggingface.co/datasets/AmelieSchreiber/general_binding_sites). The data is
|
31 |
+
not filtered by family, and thus the model may be overfit to some degree. In the Hugging Face Inference API widget to the right
|
32 |
+
there are three protein sequence examples. The first is a DNA binding protein, the second and third were obtained using [EvoProtGrad](https://github.com/Amelie-Schreiber/sampling_protein_language_models/blob/main/EvoProtGrad_copy.ipynb)
|
33 |
+
a Markov Chain Monte Carlo method of (in silico) directed evolution of proteins based on a form of Gibbs sampling. The mutatant-type
|
34 |
+
protein sequences in theory should have similar binding sites to the wild-type protein sequence, but perhaps with higher binding affinity.
|
35 |
+
Testing this out on the model, we see the two proteins indeed have the same binding sites, which validates to some degree that the model
|
36 |
+
has learned to predict binding sites well (and that EvoProtGrad works as intended).
|
37 |
|
38 |
## Training
|
39 |
|