--- license: bigscience-openrail-m widget: - text: M[MASK]LWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN datasets: - Ensembl pipeline_tag: fill-mask tags: - biology - medical --- # BERT base for proteins This is bidirectional transformer pretrained on amino-acid sequences of human proteins. Example: Insulin (P01308) ``` MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN ``` The model was trained using the masked-language-modeling objective. ## Intended uses This model is primarily aimed at being fine-tuned on the following tasks: - protein function - molecule-to-gene-expression mapping - cell targeting ## How to use in your code ```python from transformers import BertTokenizerFast, BertModel checkpoint = 'unikei/bert-base-proteins' tokenizer = BertTokenizerFast.from_pretrained(checkpoint) model = BertModel.from_pretrained(checkpoint) example = 'MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN' tokens = tokenizer(example, return_tensors='pt') predictions = model(**tokens) ```