File size: 1,520 Bytes
5680b9f
 
 
 
3c4ef2d
5680b9f
 
 
 
d5b0d23
 
 
 
 
9669104
5680b9f
9669104
 
5680b9f
9669104
 
d5b0d23
5680b9f
54132fc
 
9669104
 
5680b9f
9669104
 
5680b9f
9669104
 
5680b9f
9669104
5680b9f
9669104
5680b9f
9669104
5680b9f
9669104
 
5680b9f
9669104
5680b9f
9669104
5680b9f
9669104
5680b9f
9669104
5680b9f
9669104
5680b9f
9669104
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
---
base_model: westlake-repl/SaProt_35M_AF2
library_name: peft
---
# Base model: [westlake-repl/SaProt_35M_AF2](https://huggingface.co/westlake-repl/SaProt_35M_AF2)

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->
This model is trained on a sigle site deep mutation scanning dataset and
can be used to predict fitness score of mutant amino acid sequence of protein [RASH_HUMAN](https://www.uniprot.org/uniprotkb/P01112/entry) (GTPase HRas). 

## Protein Function
This protein involved in the activation of Ras protein signal transduction.
Ras proteins bind GDP/GTP and possess intrinsic GTPase activity.

### Task type
protein level regression

### Dataset description
The dataset is from [Deep generative models of genetic variation capture the effects of mutations](https://www.nature.com/articles/s41592-018-0138-4).
And can also be found on [SaprotHub dataset](https://huggingface.co/datasets/SaProtHub/DMS_RASH_HUMAN).

Label means fitness score of each mutant amino acid sequence, ranging from minus infinity to positive infinity where 0 is value of wildtype,
larger means higher fitness.
### Model input type
Amino acid sequence

### Performance
0.64 Spearman's ρ

### LoRA config
lora_dropout: 0.0

lora_alpha: 16

target_modules: ["query", "key", "value", "intermediate.dense", "output.dense"]

modules_to_save: ["classifier"]

### Training config
class: AdamW

betas: (0.9, 0.98)

weight_decay: 0.01

learning rate: 1e-4

epoch: 50

batch size: 256

precision: 16-mixed