File size: 1,180 Bytes

42941b5
 
b6638dc
0bda497
c90b8d5
 
 
 
 
 
0bda497

---
license: bigscience-openrail-m
widget:
- text: M[MASK]LWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN
datasets:
- Ensembl
pipeline_tag: fill-mask
tags:
- biology
- medical
---

# BERT base for proteins
This is bidirectional transformer pretrained on amino-acid sequences of human proteins. 

Example: Insulin (P01308)
```
MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN
```

The model was trained using the masked-language-modeling objective.

## Intended uses
This model is primarily aimed at being fine-tuned on the following tasks:
- protein function
- molecule-to-gene-expression mapping
- cell targeting

## How to use in your code
```python
from transformers import BertTokenizerFast, BertModel
checkpoint = 'unikei/bert-base-proteins'
tokenizer = BertTokenizerFast.from_pretrained(checkpoint)
model = BertModel.from_pretrained(checkpoint)

example = 'MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN'
tokens = tokenizer(example, return_tensors='pt')
predictions = model(**tokens)
```