IVN-RIN
/

MedPsyNIT

@@ -13,14 +13,25 @@ metrics:
 library_name: transformers
 ---
-# Advancing Italian Biomedical Information Extraction with Large Language Models: Methodological Insights and Multicenter Practical Application
-Manuscript available at [arxiv.org/abs/2306.05323](https://arxiv.org/abs/2306.05323)
-## Abstract
-The introduction of computerized medical records in hospitals has reduced burdensome activities like manual writing and information fetching. However, the data contained in medical records are still far underutilized, primarily because extracting data from unstructured textual medical records takes time and effort. Information Extraction, a subfield of Natural Language Processing, can help clinical practitioners overcome this limitation by using automated text-mining pipelines. In this work, we created the first Italian neuropsychiatric Named Entity Recognition dataset, PsyNIT, and used it to develop a Large Language Model for this task. Moreover, we collected and leveraged three external independent datasets to implement an effective multicenter model, with overall F1-score 84.77%, Precision 83.16%, Recall 86.44%. The lessons learned are: (i) the crucial role of a consistent annotation process and (ii) a fine-tuning strategy that combines classical methods with a "few-shot" approach. This allowed us to establish methodological guidelines that pave the way for Natural Language Processing studies in less-resourced languages.
-*Keywords*: Natural Language Processing | Deep Learning | Biomedical Text Mining | Large Language Model | Transformer
-*Correspondence*: ccrema@fatebenefratelli.eu

 library_name: transformers
 ---
+🤗 + 📚🩺🇮🇹 + 📖✍🏻🧑‍⚕️ =  **MedPsyNIT**
+From this repository you can download the **[MedPsyNIT](https://www.sciencedirect.com/science/article/pii/S1532046423002782)** (Medical Psychiatric Ner for ITalian) checkpoint.
+**MedPsyNIT** is built on top of [BioBIT](https://huggingface.co/IVN-RIN/bioBIT), fine-tuned on a native Italan NER (Named Entity Recognition) dataset, composed by four Italian Hospitals.
+The class of entities in the dataset are:
+- Diagnosis and comorbidities (779 examples, 13.23% of the dataset)
+- Cognitive symptoms (2386 examples, 40.52% of the dataset)
+- Neuropsychiatric symptoms (707 examples, 12.01% of the dataset)
+- Drug treatment (162 examples, 2.75% of the dataset)
+- Medical assessment (1854 examples, 31.49% of the dataset)
+We designed a set of experiments in order to mitigate annotation inconsistencies and to give the models the best possible generalization capabilities. The whole process highlighted a fundamental factor, namely that a multicenter model that can be used out-of-the-box is not effective and would likely provide low performance. However, a few hundred of high-quality, consistent examples, combined with a low-resource fine-tuning approach, can help to greatly enhance extraction quality. We believe that this evidence can be applied to other medical institutions and clinical settings, paving the way for the development of biomedical NER models in less-resourced languages.
+More details in the paper.
+**MedPsyNIT** has been evaluated during the fine-tuning process splitting it into train (90%) and test (10%). The fine-tuning procedure has been repeated ten times for each model, initializing each run with a different random state, in order to minimize the effect of randomness and also to evaluate models’ stability.
+Here are the results, summarized
+- Diagnosis and comorbidities: 76.12%
+- Cognitive symptoms: 73.01%
+- Neuropsychiatric symptoms: 77.78%
+- Drug treatment: 89.18%
+- Medical assessment: 89.59%