File size: 4,974 Bytes
c50463b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
---
license: afl-3.0
datasets:
- photonmz/roco-instruct-65k
language:
- en
library_name: transformers
pipeline_tag: image-to-text
tags:
- biology
- medical
---

# Model Card for BabyDoctor

This model card documents the BabyDoctor model, a multimodal large language model (MLLM) that merges the capabilities of CLiP and LLaMA 2 to understand and generate text, as well as understand images. It has been fine-tuned to interpret radiology images like X-rays, ultrasounds, MRIs, and CT scans in medical lingo.

## Model Details

### Model Description

BabyDoctor uses an auto-regressive language model that combines an optimized transformer architecture with a vision encoder. The fine-tuned versions leverage supervised fine-tuning (SFT), Low-Rank Adaptation (LoRA), and Quantized LoRA (QLoRA) for improved specialization in the medical domain.

- **Developed by:** Markus Zhang and Vir Chau
- **Model type:** Multimodal Large Language Model
- **Language(s) (NLP):** English
- **License:** Academic research only. Subject to LLaMa 2, CLiP, GPT-4, and LLaVA licenses.
- **Finetuned from model:** Base LLM: LLaMA-2-7B-Chat; Base Vision Encoder: CLIP-L

### Model Sources

- **Repository:** [BabyDoctor Repository](https://github.com/photomz/BabyDoctor)
- **Demo:** [Demo Video](https://www.loom.com/share/54c1f5ed36f74914b689695dae9e8e20)

## Uses

### Direct Use

BabyDoctor is intended for research use in English. It is primarily designed for assistant-like chat within the medical and health domain, providing interpretation and analysis of radiology images.

### Downstream Use 

Potential applications of BabyDoctor may include but are not limited to research, academic projects, and non-production applications in the health and medical domain.

### Out-of-Scope Use

BabyDoctor should not be used in any manner that violates applicable laws or regulations (including trade compliance laws), in languages other than English, or in any other way that is prohibited by the Acceptable Use Policy and Licensing Agreement for BabyDoctor. The model is not ready for production user-facing use cases and requires further tuning.

## Bias, Risks, and Limitations

While BabyDoctor is aimed at providing helpful medical and health-related advice, it should not be considered a replacement for professional medical advice. There may be areas of medicine or health that it does not cover as accurately. The model does not have access to individual health records or specific patient information, and its advice should not replace a consultation with a healthcare professional.

## How to Get Started with the Model

Instructions for reproducing the results with BabyDoctor and running the model on your own data can be found in the [BabyDoctor Repository](https://github.com/photomz/BabyDoctor).

## Training Details

### Training Data

BabyDoctor was trained using the LLaVA-Instruct-80K and [Roco-Instruct-65K](https://huggingface.co/datasets/photonmz/roco-instruct-65k) datasets, which are instruction-following and medical-related datasets respectively. No Meta user data was included in the pretraining or the fine-tuning datasets. 

The pretraining data has a cutoff of September 2022, but some tuning data is more recent, up to July 2023.

### Training Procedure 

#### Preprocessing 

The model was pretrained on 2T tokens and 1-100M images using LLaMA 2-7B Chat and CLiP. LLaVA then projects CLiP onto LLaMa 2, followed by training with synthetic GPT-4 instruction-following data. Finally, BabyDoctor was fine-tuned to interpret radiology images.

#### Training Hyperparameters

- Base LLM: LLaMA-2-7B-Chat
- Base Vision Encoder: CLiP-L
- Pretraining Data: LCS-558K
- Pretraining Schedule: 1e
- Finetuning Data 1: LLaVA-Instruct-80K
- Finetuning Schedule 1: lora (low rank domain adaptation) 1 epoch
- Finetuning data 2: [roco-instruct-65k](https://huggingface.co/datasets/photonmz/roco-instruct-65k)
- Finetuning Schedule: quantised lora (qlora) 1 epoch 4-bit

#### Speeds, Sizes, Times

Training took 8 hours using Lambda Labs' 1xA10 cloud GPUs.

## Evaluation

### Testing Data, Factors & Metrics

Given its specific purpose of interpreting radiology images, BabyDoctor has not been evaluated on as wide a range of tasks as the LLaMA 2 models.

### Recommendations

Users (both direct and downstream) should be aware of the limitations and intended use of the model. They should not consider the information generated by BabyDoctor as a replacement for professional medical advice.


## Citation

**BibTeX:**

@misc{photomz2023,
  author = {Markus Zhang, Vir Chau},
  title = {BabyDoctor},
  year = {2023},
  howpublished = {\url{https://github.com/photomz/BabyDoctor}},
  note = {GitHub}
}

**APA:**

@misc{photomz2023,
  author = {{Zhang, M.} and {Chau, V.}},
  title = {BabyDoctor},
  year = {2023},
  howpublished = {\url{https://github.com/photomz/BabyDoctor}},
  note = {GitHub}
}


Contact us by submitting a [GitHub issue](https://github.com/photomz/BabyDoctor)!