unikei commited on
Commit
0bda497
1 Parent(s): 456180f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -2
README.md CHANGED
@@ -1,11 +1,39 @@
1
  ---
2
  license: bigscience-openrail-m
3
  widget:
4
- - text: M[MASK]ESSDKLYRVEYAKSGRASCKKCSESIPKDSLRMAIMVQSPMFDGKVPHWYHFSCFWKVGHSIRHPDVEVDGFSELRWDDQQKVKKTAEAGGVTGKGQDGIGSKAEKTLGDFAAEYAKSNRSTCKGCMEKIEKGQVRLSKKMVDPEKPQLGMIDRWYHPGCFVKNREELGFRPEYSASQLKGFSLLATEDKEALKKQLPGVKSEGK
5
  datasets:
6
  - Ensembl
7
  pipeline_tag: fill-mask
8
  tags:
9
  - biology
10
  - medical
11
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: bigscience-openrail-m
3
  widget:
4
+ - text: M[MASK]LWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN
5
  datasets:
6
  - Ensembl
7
  pipeline_tag: fill-mask
8
  tags:
9
  - biology
10
  - medical
11
+ ---
12
+
13
+ # BERT base for proteins
14
+ This is bidirectional transformer pretrained on amino-acid sequences of human proteins.
15
+
16
+ Example: Insulin (P01308)
17
+ ```
18
+ MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN
19
+ ```
20
+
21
+ The model was trained using the masked-language-modeling objective.
22
+
23
+ ## Intended uses
24
+ This model is primarily aimed at being fine-tuned on the following tasks:
25
+ - protein function
26
+ - molecule-to-gene-expression mapping
27
+ - cell targeting
28
+
29
+ ## How to use in your code
30
+ ```python
31
+ from transformers import BertTokenizerFast, BertModel
32
+ checkpoint = 'unikei/bert-base-proteins'
33
+ tokenizer = BertTokenizerFast.from_pretrained(checkpoint)
34
+ model = BertModel.from_pretrained(checkpoint)
35
+
36
+ example = 'MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN'
37
+ tokens = tokenizer(example, return_tensors='pt')
38
+ predictions = model(**tokens)
39
+ ```