multimolecule
/

mrnafm

@@ -10,19 +10,19 @@ library_name: multimolecule
 pipeline_tag: fill-mask
 mask_token: "<mask>"
 widget:
-  - example_title: "PRNP"
-    text: "CTG<mask>AAGCGGCCCACGCGGACTGACGGGCGGGGG"
     output:
-      - label: "GGC"
-        score: 0.09496457129716873
-      - label: "GAG"
-        score: 0.09480331838130951
-      - label: "GAC"
-        score: 0.07397700101137161
-      - label: "AAG"
-        score: 0.07375374436378479
-      - label: "GUG"
-        score: 0.06565868109464645
 ---
 # mRNA-FM
@@ -94,7 +94,7 @@ RNA-FM is a [bert](https://huggingface.co/google-bert/bert-base-uncased)-style m
 - **Paper**: [Interpretable RNA Foundation Model from Unannotated Data for Highly Accurate RNA Structure and Function Predictions](https://doi.org/10.1101/2022.08.06.503062)
 - **Developed by**: Jiayang Chen, Zhihang Hu, Siqi Sun, Qingxiong Tan, Yixuan Wang, Qinze Yu, Licheng Zong, Liang Hong, Jin Xiao, Tao Shen, Irwin King, Yu Li
 - **Model type**: [BERT](https://huggingface.co/google-bert/bert-base-uncased) - [ESM](https://huggingface.co/facebook/esm2_t48_15B_UR50D)
-- **Original Repository**: [https://github.com/ml4bio/RNA-FM](https://github.com/ml4bio/RNA-FM)
 ## Usage
@@ -111,29 +111,29 @@ You can use this model directly with a pipeline for masked language modeling:
 ```python
 >>> import multimolecule  # you must import multimolecule to register models
 >>> from transformers import pipeline
->>> unmasker = pipeline('fill-mask', model='multimolecule/mrnafm')
->>> unmasker("ctg<mask>aagcggcccacgcggactgacgggcggggg")
-[{'score': 0.09496457129716873,
-  'token': 67,
-  'token_str': 'GGC',
-  'sequence': 'CUG GGC AAG CGG CCC ACG CGG ACU GAC GGG CGG GGG'},
- {'score': 0.09480331838130951,
-  'token': 58,
-  'token_str': 'GAG',
-  'sequence': 'CUG GAG AAG CGG CCC ACG CGG ACU GAC GGG CGG GGG'},
- {'score': 0.07397700101137161,
-  'token': 57,
-  'token_str': 'GAC',
-  'sequence': 'CUG GAC AAG CGG CCC ACG CGG ACU GAC GGG CGG GGG'},
- {'score': 0.07375374436378479,
-  'token': 8,
-  'token_str': 'AAG',
-  'sequence': 'CUG AAG AAG CGG CCC ACG CGG ACU GAC GGG CGG GGG'},
- {'score': 0.06565868109464645,
-  'token': 73,
-  'token_str': 'GUG',
-  'sequence': 'CUG GUG AAG CGG CCC ACG CGG ACU GAC GGG CGG GGG'}]
 ```
 ### Downstream Use
@@ -146,11 +146,11 @@ Here is how to use this model to get the features of a given sequence in PyTorch
 from multimolecule import RnaTokenizer, RnaFmModel
-tokenizer = RnaTokenizer.from_pretrained('multimolecule/mrnafm')
-model = RnaFmModel.from_pretrained('multimolecule/mrnafm')
 text = "UAGCUUAUCAGACUGAUGUUGA"
-input = tokenizer(text, return_tensors='pt')
 output = model(**input)
 ```
@@ -166,17 +166,17 @@ import torch
 from multimolecule import RnaTokenizer, RnaFmForSequencePrediction
-tokenizer = RnaTokenizer.from_pretrained('multimolecule/mrnafm')
-model = RnaFmForSequencePrediction.from_pretrained('multimolecule/mrnafm')
 text = "UAGCUUAUCAGACUGAUGUUGA"
-input = tokenizer(text, return_tensors='pt')
 label = torch.tensor([1])
 output = model(**input, labels=label)
 ```
-#### Nucleotide Classification / Regression
 **Note**: This model is not fine-tuned for any specific task. You will need to fine-tune the model on a downstream task to use it for nucleotide classification or regression.
@@ -184,14 +184,14 @@ Here is how to use this model as backbone to fine-tune for a nucleotide-level ta
 ```python
 import torch
-from multimolecule import RnaTokenizer, RnaFmForNucleotidePrediction
-tokenizer = RnaTokenizer.from_pretrained('multimolecule/mrnafm')
-model = RnaFmForNucleotidePrediction.from_pretrained('multimolecule/mrnafm')
 text = "UAGCUUAUCAGACUGAUGUUGA"
-input = tokenizer(text, return_tensors='pt')
 label = torch.randint(2, (len(text), ))
 output = model(**input, labels=label)
@@ -208,11 +208,11 @@ import torch
 from multimolecule import RnaTokenizer, RnaFmForContactPrediction
-tokenizer = RnaTokenizer.from_pretrained('multimolecule/mrnafm')
-model = RnaFmForContactPrediction.from_pretrained('multimolecule/mrnafm')
 text = "UAGCUUAUCAGACUGAUGUUGA"
-input = tokenizer(text, return_tensors='pt')
 label = torch.randint(2, (len(text), len(text)))
 output = model(**input, labels=label)

 pipeline_tag: fill-mask
 mask_token: "<mask>"
 widget:
+  - example_title: "Homo sapiens PRNP mRNA for prion"
+    text: "AGC<mask>CAUUAUGGCGAACCUUGGCUGCUG"
     output:
+      - label: "AAA"
+        score: 0.05433480441570282
+      - label: "AUC"
+        score: 0.04437034949660301
+      - label: "AAU"
+        score: 0.03882088139653206
+      - label: "ACA"
+        score: 0.037016965448856354
+      - label: "ACC"
+        score: 0.03563101962208748
 ---
 # mRNA-FM
 - **Paper**: [Interpretable RNA Foundation Model from Unannotated Data for Highly Accurate RNA Structure and Function Predictions](https://doi.org/10.1101/2022.08.06.503062)
 - **Developed by**: Jiayang Chen, Zhihang Hu, Siqi Sun, Qingxiong Tan, Yixuan Wang, Qinze Yu, Licheng Zong, Liang Hong, Jin Xiao, Tao Shen, Irwin King, Yu Li
 - **Model type**: [BERT](https://huggingface.co/google-bert/bert-base-uncased) - [ESM](https://huggingface.co/facebook/esm2_t48_15B_UR50D)
+- **Original Repository**: [ml4bio/RNA-FM](https://github.com/ml4bio/RNA-FM)
 ## Usage
 ```python
 >>> import multimolecule  # you must import multimolecule to register models
 >>> from transformers import pipeline
+>>> unmasker = pipeline("fill-mask", model="multimolecule/mrnafm")
+>>> unmasker("agc<mask>cauuauggcgaaccuuggcugcug")
+[{'score': 0.05433480441570282,
+  'token': 6,
+  'token_str': 'AAA',
+  'sequence': 'AGC AAA CAU UAU GGC GAA CCU UGG CUG CUG'},
+ {'score': 0.04437034949660301,
+  'token': 22,
+  'token_str': 'AUC',
+  'sequence': 'AGC AUC CAU UAU GGC GAA CCU UGG CUG CUG'},
+ {'score': 0.03882088139653206,
+  'token': 9,
+  'token_str': 'AAU',
+  'sequence': 'AGC AAU CAU UAU GGC GAA CCU UGG CUG CUG'},
+ {'score': 0.037016965448856354,
+  'token': 11,
+  'token_str': 'ACA',
+  'sequence': 'AGC ACA CAU UAU GGC GAA CCU UGG CUG CUG'},
+ {'score': 0.03563101962208748,
+  'token': 12,
+  'token_str': 'ACC',
+  'sequence': 'AGC ACC CAU UAU GGC GAA CCU UGG CUG CUG'}]
 ```
 ### Downstream Use
 from multimolecule import RnaTokenizer, RnaFmModel
+tokenizer = RnaTokenizer.from_pretrained("multimolecule/mrnafm")
+model = RnaFmModel.from_pretrained("multimolecule/mrnafm")
 text = "UAGCUUAUCAGACUGAUGUUGA"
+input = tokenizer(text, return_tensors="pt")
 output = model(**input)
 ```
 from multimolecule import RnaTokenizer, RnaFmForSequencePrediction
+tokenizer = RnaTokenizer.from_pretrained("multimolecule/mrnafm")
+model = RnaFmForSequencePrediction.from_pretrained("multimolecule/mrnafm")
 text = "UAGCUUAUCAGACUGAUGUUGA"
+input = tokenizer(text, return_tensors="pt")
 label = torch.tensor([1])
 output = model(**input, labels=label)
 ```
+#### Token Classification / Regression
 **Note**: This model is not fine-tuned for any specific task. You will need to fine-tune the model on a downstream task to use it for nucleotide classification or regression.
 ```python
 import torch
+from multimolecule import RnaTokenizer, RnaFmForTokenPrediction
+tokenizer = RnaTokenizer.from_pretrained("multimolecule/mrnafm")
+model = RnaFmForTokenPrediction.from_pretrained("multimolecule/mrnafm")
 text = "UAGCUUAUCAGACUGAUGUUGA"
+input = tokenizer(text, return_tensors="pt")
 label = torch.randint(2, (len(text), ))
 output = model(**input, labels=label)
 from multimolecule import RnaTokenizer, RnaFmForContactPrediction
+tokenizer = RnaTokenizer.from_pretrained("multimolecule/mrnafm")
+model = RnaFmForContactPrediction.from_pretrained("multimolecule/mrnafm")
 text = "UAGCUUAUCAGACUGAUGUUGA"
+input = tokenizer(text, return_tensors="pt")
 label = torch.randint(2, (len(text), len(text)))
 output = model(**input, labels=label)