This model is a fine-tuned model of BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext (hugging-face card). The current model was developed for the web-based ANDDigest system for the classification of the short names of drugs and metabolites in texts on the basis of their context (the name considered to be short if it's length is 4 symbols or less). The analyzed name should be replaced in text with tag.
Input:
Any biomedical text where a name of classified object is replaced with tag, for example, this pubmed abstract:
Intermittent obstruction of jejunostomy tube due to Ascaris lumbricoides infection. A 45-year-old Costa Rican woman was seen for a jejunostomy tube malfunction. There was no evidence of tube malposition or intestinal obstruction. During endoscopy, a long worm was retrieved from the distal duodenum; it was later confirmed to be Ascaris lumbricoides. After treatment with <andsystem-candidate>, no further episodes of tube occlusion were observed. This case reminds us of the importance of considering helminthic infections and their atypical manifestations in patients from endemic regions.
In this example mebendazole was replaced with <andsystem-candidate>. Please keep in mind that maximum length of input sequence for BERT is limited to 512 tokens.
Output:
LABEL_0 refers to the probability of the FALSE recognition, i.e. if the context of <andsystem-candidate> doesn't corresponds to the context specific for drugs or metabolites.
LABEL_1 refers to the probability of the TRUE recognition, i.e. when the context of <andsystem-candidate> corresponds to the context specific for drugs or metabolites.
The optimal threshold value for the short names of drugs or metabolites for the LABEL_1, was calculated using a gold standard (add link). It is >= 0.999992847442627.
The Mathew Correlation Coefficient of the model for the long names (>= 15 symbols) is 0.983.
The ROC AUC value of the model, calculated for the short names (<= 4 symbols) is 0.907.
Citing
If you found the developed models to be useful in your research, please cite the following articles:
Ivanisenko, T.V., Saik, O.V., Demenkov, P.S. et al. ANDDigest: a new web-based module of ANDSystem for the search of knowledge in the scientific literature. BMC Bioinformatics 21 (Suppl 11), 228 (2020). https://doi.org/10.1186/s12859-020-03557-8
Ivanisenko, T.V.; Demenkov, P.S.; Kolchanov, N.A.; Ivanisenko, V.A. The New Version of the ANDDigest Tool with Improved AI-Based Short Names Recognition. Int. J. Mol. Sci. 2022, 23, 14934. https://doi.org/10.3390/ijms232314934
- Downloads last month
- 0