|
--- |
|
library_name: transformers |
|
license: apache-2.0 |
|
datasets: |
|
- raidium/ECNQA_generated_questions |
|
- raidium/ECN-QA |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
tags: |
|
- medical |
|
base_model: stanford-crfm/BioMedLM |
|
--- |
|
|
|
|
|
# Model Card for Raidium MQG model |
|
|
|
|
|
The model is introduced in the paper "Efficient Medical Question Answering with Knowledge-Augmented Question Generation". |
|
|
|
Paper: [https://arxiv.org/abs/2405.14654](https://arxiv.org/abs/2405.14654) |
|
|
|
MQG is is a transformer language model pre-trained on a series of medical textbooks, and medical questions generated by GPT-4. The weights are initialized with |
|
[BioMedLM](https://huggingface.co/stanford-crfm/BioMedLM), then further pre-trained on those datasets. |
|
|
|
The questions have been generated from prompt containing medical data from the textbooks. |
|
They are available here: [ECNQA_generated_questions](https://huggingface.co/datasets/raidium/ECNQA_generated_questions). |
|
|
|
MQG is designed to be fine-tuned for Medical Question Answering tasks. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cdea59a9be5c195561c2b8/tMb8cNuV6ZYnjrnUC1Tg2.png) |
|
|
|
In the expanding field of language model applications, medical knowledge representation remains a significant challenge due to the specialized nature of the domain. |
|
Large language models, such as GPT-4, obtain reasonable scores on medical question answering tasks, but smaller models are far behind. |
|
In this work, we introduce a method to improve the proficiency of a small language model in the medical domain by employing a two-fold approach. |
|
We first fine-tune the model on a corpus of medical textbooks. Then, we use GPT-4 to generate questions similar to the downstream task, prompted with textbook knowledge, and use them to fine-tune the model. |
|
We show the benefits of our training strategy on a medical answering question dataset. |
|
|
|
|
|
### Using the model |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("raidium/MQG") |
|
model = AutoModelForCausalLM.from_pretrained("raidium/MQG") |
|
``` |
|
|
|
|
|
- **Developed by:** Raidium |
|
- **Model type:** Transformer |
|
- **License:** Aopache 2.0 |
|
- **Finetuned from model:** [BioMedLM](https://huggingface.co/stanford-crfm/BioMedLM) |
|
|
|
### Model Sources [optional] |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **Repository:** [https://github.com/raidium-med/MQG] |
|
- **Paper:** [https://arxiv.org/abs/2405.14654](https://arxiv.org/abs/2405.14654) |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
MQG is trained using next-token-prediction on generated questions. |
|
Therefore, it can be used out-of-the-box to generate potential answers for medical question answering tasks. |
|
However, the generated questions might contain some errors, so it is advised to fine-tune the model on your dataset, and use the models to rank the potential answers. |
|
|
|
### Downstream Use |
|
|
|
MQG can be fine-tuned for Medical Question Answering tasks. |
|
For multiple choice questions, a classification head should be appended at the end of the model, to rank different proposed answers. |
|
|
|
### Out-of-Scope Use |
|
|
|
This model should not be used for datasets outside medical tasks. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
There is no guarantee that the model answers medical questions correctly. It should only be used for academic purposes, and not in clinical care. |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
The model is trained on a corpus of medical textbooks, and further pre-trained on generated questions: [ECNQA_generated_questions](https://huggingface.co/datasets/raidium/ECNQA_generated_questions). |
|
|
|
### Training Procedure |
|
|
|
MGQ is trained using next-token-prediction on both datasets. |
|
|
|
#### Training Hyperparameters |
|
|
|
- **Training regime:** fp16 mixed-precision training. <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision --> |
|
|
|
## Evaluation |
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
#### Testing Data |
|
|
|
We tested the model on a medical question answering dataset, ECN-QA, based on the french medical residency examination. |
|
It is composed of "single" and "progressive" questions (i.e a serie of multiple related questions). |
|
It is a multiple-choice question dataset, containing 5 propositions for each question. |
|
|
|
#### Metrics |
|
|
|
We use the accuracy to evaluate the model on Medical Question Answering. |
|
|
|
### Results |
|
|
|
See paper: [https://arxiv.org/abs/2405.14654](https://arxiv.org/abs/2405.14654) |
|
|
|
### Model Architecture and Objective |
|
|
|
The model is based on BioMedLM's architecture, which is modified from GPT-2 architecture. |
|
|
|
### Compute Infrastructure |
|
|
|
#### Hardware |
|
|
|
The model was trained on the Jean-Zay supercomputer, on multiple nodes with 4 A100 gpus. |
|
|
|
#### Software |
|
|
|
Pytorch, DeepSpeed |
|
|
|
## Citation |
|
|
|
|
|
**BibTeX:** |
|
``` |
|
@article{khlaut2024efficient, |
|
title={Efficient Medical Question Answering with Knowledge-Augmented Question Generation}, |
|
author={Khlaut, Julien and Dancette, Corentin and Ferreres, Elodie and Bennani, Alaedine and H{\'e}rent, Paul and Manceron, Pierre}, |
|
journal={Clinical NLP Workshop, NAACL 2024}, |
|
year={2024} |
|
} |
|
``` |
|
|
|
## Model Card Contact |
|
|
|
julien.khlaut at raidium.fr |