---
library_name: transformers
license: apache-2.0
datasets:
- raidium/ECNQA_generated_questions
language:
- en
metrics:
- accuracy
tags:
- medical
base_model: stanford-crfm/BioMedLM

---


# Model Card for Raidium MQG model


The model is introduced in the paper "Efficient Medical Question Answering with Knowledge-Augmented Question Generation".

Paper: [https://arxiv.org/abs/2405.14654](https://arxiv.org/abs/2405.14654)

MQG is is a transformer language model pre-trained on a series of medical textbooks, and medical questions generated by GPT-4. The weights are initialized with 
[BioMedLM](https://huggingface.co/stanford-crfm/BioMedLM), then further pre-trained on those datasets.

The questions have been generated from prompt containing medical data from the textbooks. 
They are available here: [ECNQA_generated_questions](https://huggingface.co/datasets/raidium/ECNQA_generated_questions).

MQG is designed to be fine-tuned for Medical Question Answering tasks.

## Model Details

### Model Description

![image/png](https://cdn-uploads.huggingface.co/production/uploads/62cdea59a9be5c195561c2b8/tMb8cNuV6ZYnjrnUC1Tg2.png)

In the expanding field of language model applications, medical knowledge representation remains a significant challenge due to the specialized nature of the domain. 
Large language models, such as GPT-4, obtain reasonable scores on medical question answering tasks, but smaller models are far behind. 
In this work, we introduce a method to improve the proficiency of a small language model in the medical domain by employing a two-fold approach. 
We first fine-tune the model on a corpus of medical textbooks. Then, we use GPT-4 to generate questions similar to the downstream task, prompted with textbook knowledge, and use them to fine-tune the model. 
We show the benefits of our training strategy on a medical answering question dataset.


### Using the model

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("raidium/MQG")
model = AutoModelForCausalLM.from_pretrained("raidium/MQG") 
```


- **Developed by:** Raidium
- **Model type:** Transformer
- **License:** Aopache 2.0
- **Finetuned from model:** [BioMedLM](https://huggingface.co/stanford-crfm/BioMedLM)

### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository:** [https://github.com/raidium-med/MQG]
- **Paper:**  [https://arxiv.org/abs/2405.14654](https://arxiv.org/abs/2405.14654)

## Uses

### Direct Use

MQG is trained using next-token-prediction on generated questions. 
Therefore, it can be used out-of-the-box to generate potential answers for medical question answering tasks.
However, the generated questions might contain some errors, so it is advised to fine-tune the model on your dataset, and use the models to rank the potential answers.

### Downstream Use

MQG can be fine-tuned for Medical Question Answering tasks. 
For multiple choice questions, a classification head should be appended at the end of the model, to rank different proposed answers.

### Out-of-Scope Use

This model should not be used for datasets outside medical tasks.

## Bias, Risks, and Limitations

There is no guarantee that the model answers medical questions correctly. It should only be used for academic purposes, and not in clinical care.

## Training Details

### Training Data

The model is trained on a corpus of medical textbooks, and further pre-trained on generated questions:  [ECNQA_generated_questions](https://huggingface.co/datasets/raidium/ECNQA_generated_questions).

### Training Procedure

MGQ is trained using next-token-prediction on both datasets.

#### Training Hyperparameters

- **Training regime:**  fp16 mixed-precision training. <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->

## Evaluation

### Testing Data, Factors & Metrics

#### Testing Data

We tested the model on a medical question answering dataset, ECN-QA, based on the french medical residency examination. 
It is composed of "single" and "progressive" questions (i.e a serie of multiple related questions).
It is a multiple-choice question dataset, containing 5 propositions for each question.

#### Metrics

We use the accuracy to evaluate the model on Medical Question Answering.

### Results

See paper: [https://arxiv.org/abs/2405.14654](https://arxiv.org/abs/2405.14654)

### Model Architecture and Objective

The model is based on BioMedLM's architecture, which is modified from GPT-2 architecture.

### Compute Infrastructure

#### Hardware

The model was trained on the Jean-Zay supercomputer, on multiple nodes with 4 A100 gpus. 

#### Software

Pytorch, DeepSpeed

## Citation


**BibTeX:**
```
@article{khlaut2024efficient,
  title={Efficient Medical Question Answering with Knowledge-Augmented Question Generation},
  author={Khlaut, Julien and Dancette, Corentin and Ferreres, Elodie and Bennani, Alaedine and H{\'e}rent, Paul and Manceron, Pierre},
  journal={Clinical NLP Workshop, NAACL 2024},
  year={2024}
}
```

## Model Card Contact

julien.khlaut at raidium.fr