File size: 1,190 Bytes
875ae9b
 
 
 
070b827
875ae9b
 
 
 
a1a375a
875ae9b
a1a375a
b2a0682
875ae9b
a1a375a
 
 
 
ae44630
a1a375a
 
 
 
875ae9b
a1a375a
 
875ae9b
a1a375a
 
875ae9b
a1a375a
875ae9b
a1a375a
875ae9b
a1a375a
875ae9b
a1a375a
 
875ae9b
a1a375a
875ae9b
a1a375a
875ae9b
a1a375a
875ae9b
a1a375a
875ae9b
a1a375a
875ae9b
a1a375a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
base_model: westlake-repl/SaProt_650M_AF2
library_name: peft
---
# Base model: [westlake-repl/SaProt_650M_AF2](https://huggingface.co/westlake-repl/SaProt_650M_AF2)

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->
This model is used to predict signal peptides on each site of amino acid sequences.

### Task type
Residue level clssification

### Dataset description
The dataset is from [SignalP 6.0 predicts all five types of signal
peptides using protein language models](https://www.nature.com/articles/s41587-021-01156-3).
This dataset contains 7 classes:

S (0): Sec/SPI signal peptide | T (1): Tat/SPI or Tat/SPII signal peptide | L (2): Sec/SPII signal peptide |
P (3): Sec/SPIII signal peptide | I (4): cytoplasm | M (5): transmembrane | O (6): extracellular
### Model input type
Amino acid sequence

### Performance
test_acc: 0.96

### LoRA config
lora_dropout: 0.0

lora_alpha: 16

target_modules: ["query", "key", "value", "intermediate.dense", "output.dense"]

modules_to_save: ["classifier"]

### Training config
class: AdamW

betas: (0.9, 0.98)

weight_decay: 0.01

learning rate: 1e-4

epoch: 10

batch size: 100

precision: 16-mixed