File size: 1,377 Bytes
ffbe8ae
 
 
 
ae05764
ffbe8ae
 
 
 
2e87b47
 
ffbe8ae
2e87b47
 
ffbe8ae
 
2e87b47
 
ffbe8ae
2e87b47
 
ffbe8ae
 
2e87b47
 
 
ffbe8ae
2e87b47
 
 
 
ffbe8ae
2e87b47
 
ffbe8ae
2e87b47
ffbe8ae
2e87b47
ffbe8ae
2e87b47
ffbe8ae
2e87b47
 
ffbe8ae
2e87b47
ffbe8ae
2e87b47
ffbe8ae
2e87b47
ffbe8ae
2e87b47
ffbe8ae
2e87b47
ffbe8ae
2e87b47
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
---
base_model: westlake-repl/SaProt_35M_AF2
library_name: peft
---
# Base model: [westlake-repl/SaProt_35M_AF2](https://huggingface.co/westlake-repl/SaProt_35M_AF2)

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->
This model is trained on a sigle site deep mutation scanning dataset and
can be used to predict fitness score of mutant amino acid sequence of protein AsCas12f. 

## Protein Function
AsCas12a is widely utilized as genome-editing tools in human cells.


### Task type
protein level regression

### Dataset description
The dataset is from [An AsCas12f-based compact genome-editing tool derived by deep mutational scanning and structural analysis](https://doi.org/10.1016/j.cell.2023.08.031).


Label means fitness score of each mutant amino acid sequence. 
Ranging from negative infinity to positive infinity. The wildtype sequence has fitness 1. 
If the effect larger than 1 represents high fitness, smaller than 1 represents low fitness.

### Model input type
Amino acid sequence
### Performance
 0.60 Spearman's ρ

### LoRA config
lora_dropout: 0.0

lora_alpha: 16

target_modules: ["query", "key", "value", "intermediate.dense", "output.dense"]

modules_to_save: ["classifier"]

### Training config
class: AdamW

betas: (0.9, 0.98)

weight_decay: 0.01

learning rate: 1e-4

epoch: 50

batch size: 36

precision: 16-mixed