--- license: mit datasets: - disi-unibo-nlp/medmcqa-MedGENIE language: - en metrics: - accuracy pipeline_tag: question-answering tags: - medical widget: - text: >- Which of the following is not true for myelinated nerve fibers: A. Impulse through myelinated fibers is slower than non-myelinated fibers B. Membrane currents are generated at nodes of Ranvier C. Saltatory conduction of impulses is seen D. Local anesthesia is effective only when the nerve is not covered by myelin sheath context: >- The myelin sheath of myelinated nerve fibers is a covering that acts as insulation and increases the rate of conduction. Therefore, impulse through myelinated fibers is faster than non-myelinated fibers. Understanding these differences in structure and function between these two types of nerve cells helps us appreciate how local anesthetics work, as well as why they are more effective on small diameter axons (which are not heavily myelinated). --- # Model Card for MedGENIE-fid-flan-t5-base-medmcqa MedGENIE comprises a collection of language models designed to utilize generated contexts, rather than retrieved ones, for addressing multiple-choice open-domain questions in the medical field. Specifically, **MedGENIE-fid-flan-t5-base-medmcqa** is a *fusion-in-decoder* (FID) model based on [flan-t5-base](https://huggingface.co/google/flan-t5-base), trained on the [MedMCQA](https://huggingface.co/datasets/disi-unibo-nlp/medmcqa-MedGENIE) dataset and grounded on artificial contexts generated by [PMC-LLaMA-13B](https://huggingface.co/axiong/PMC_LLaMA_13B). This model achieves performance levels comparable to *state-of-the-art* (SOTA) larger models on both MedMCQA and MMLU-Medical benchmarks. ## Model description - **Language(s) (NLP):** English - **License:** MIT - **Finetuned from model:** [google/flan-t5-base](https://huggingface.co/google/flan-t5-base) - **Repository:** https://github.com/disi-unibo-nlp/medgenie - **Paper:** [To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question Answering](https://arxiv.org/abs/2403.01924) ## Performance At the time of release (February 2024), **MedGENIE-fid-flan-t5-base-medmcqa** outcompetes many fine-tuned and few-shot versions of 7B models on MedMCQA. Moreover, it emerges as the leading model on MMLU-Medical, a compilation of 9 medical subsets from [MMLU](https://huggingface.co/datasets/lukaemon/mmlu), following Zephyr-β (7B) augmented with MedWiki. | Model | Ground (Source) | Learning | Params | MedMCQA | MMLU-medical | AVG (↓) | |-----------------------------------------------------|-----------------------------------|-------------------|-----------------|--------------------------------|--------------------------------------------|-----------------------------------------| | MEDITRON ([Chen et al.](https://arxiv.org/abs/2311.16079)) | ∅ | Fine-tuned | 7B | 59.2 | 55.6 | 57.4 | | VOD ([LiĆ©vin et al. 2023](https://arxiv.org/abs/2210.06345)) | R (MedWiki) | Fine-tuned | 220M | 58.3 | 56.8 | 57.6 | | Zephyr-β | R (MedWiki) | 2-shot | 7B | 47.0 | 66.7 | 56.9 | | **MedGENIE-FID-Flan-T5** | **G (PMC-LLaMA)** | **Fine-tuned** | **250M** | **52.1** | **59.9** | **56.0** | | PMC-LLaMA ([Chen et al.](https://arxiv.org/abs/2311.16079)) | ∅ | Fine-tuned | 7B | 51.4 | 59.7 | 55.6 | | LLaMA-2 ([Chen et al.](https://arxiv.org/abs/2311.16079)) | ∅ | Fine-tuned | 7B | 54.4 | 56.3 | 55.4 | | Zephyr-β ([Chen et al.](https://arxiv.org/abs/2311.16079)) | ∅ | 2-shot | 7B | 43.4 | 60.7 | 52.1 | | Mistral-Instruct | R (MedWiki) | 2-shot | 7B | 44.3 | 58.5 | 51.4 | | Mistral-Instruct ([Chen et al.](https://arxiv.org/abs/2311.16079)) | ∅ | 3-shot | 7B | 40.2 | 55.8 | 48.0 | | LLaMA-2-chat | ∅ | 2-shot | 7B | 35.0 | 49.3 | 42.2 | | LLaMA-2-chat | R (MedWiki) | 2-shot | 7B | 37.2 | 52.0 | 44.6 | ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - n_context: 5 - per_gpu_batch_size: 2 - accumulation_steps: 2 - total_steps: 182,816 - eval_freq: 22,852 - optimizer: AdamW - scheduler: linear - weight_decay: 0.01 - warmup_ratio: 0.1 - text_maxlength: 600 ### Bias, Risk and Limitation Our model is trained on artificially generated contextual documents, which might inadvertently magnify inherent biases and depart from clinical and societal norms. This could lead to the spread of convincing medical misinformation. To mitigate this risk, we recommend a cautious approach: domain experts should manually review any output before real-world use. This ethical safeguard is crucial to prevent the dissemination of potentially erroneous or misleading information, particularly within clinical and scientific circles. ## Citation If you find MedGENIE-fid-flan-t5-base-medmcqa is useful in your work, please cite it with: ``` @misc{frisoni2024generate, title={To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question Answering}, author={Giacomo Frisoni and Alessio Cocchieri and Alex Presepi and Gianluca Moro and Zaiqiao Meng}, year={2024}, eprint={2403.01924}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```