shengz commited on
Commit
81597ca
·
verified ·
1 Parent(s): c130aa0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -12
README.md CHANGED
@@ -9,7 +9,8 @@ tags:
9
 
10
  # LLaVA-Med v1.5, using [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) as LLM for a better commercial license
11
 
12
- LLaVA-Med combines a pre-trained large language model with a pre-trained image encoder for biomedical multimodal chatbot use cases.
 
13
  LLaVA-Med was proposed in [LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day](https://arxiv.org/abs/2306.00890) by Chunyuan Li, Cliff Wong, Sheng Zhang, Naoto Usuyama, Haotian Liu, Jianwei Yang, Tristan Naumann, Hoifung Poon, Jianfeng Gao.
14
 
15
 
@@ -27,22 +28,33 @@ https://github.com/microsoft/LLaVA-Med/issues
27
  [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) license.
28
 
29
  ## Intended use
30
- **Primary intended uses:**
31
- The primary use of LLaVA-Med is biomedical research on large multimodal models and chatbots.
32
 
33
- **Primary intended users:**
34
- The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
- ## Training dataset
37
- - 500K filtered image-text pairs from PubMed.
38
- - 60K GPT-generated multimodal instruction-following data.
39
 
40
- ## Evaluation dataset
41
- [Medical Visual Chat](https://github.com/microsoft/LLaVA-Med?tab=readme-ov-file#medical-visual-chat-gpt-assisted-evaluation)
42
 
 
43
 
44
- ### How to use
45
- See [Serving](https://github.com/microsoft/LLaVA-Med?tab=readme-ov-file#serving) and [Evaluation](https://github.com/microsoft/LLaVA-Med?tab=readme-ov-file#evaluation).
46
 
47
 
48
  ### BibTeX entry and citation info
 
9
 
10
  # LLaVA-Med v1.5, using [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) as LLM for a better commercial license
11
 
12
+ Large Language and Vision Assistant for bioMedicine (i.e., “LLaVA-Med”) is a large language and vision model trained using a curriculum learning method for adapting LLaVA to the biomedical domain. It is an open-source release intended for research use only to facilitate reproducibility of the corresponding paper which claims improved performance for open-ended biomedical questions answering tasks, including common visual question answering (VQA) benchmark datasets such as PathVQA and VQA-RAD.
13
+
14
  LLaVA-Med was proposed in [LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day](https://arxiv.org/abs/2306.00890) by Chunyuan Li, Cliff Wong, Sheng Zhang, Naoto Usuyama, Haotian Liu, Jianwei Yang, Tristan Naumann, Hoifung Poon, Jianfeng Gao.
15
 
16
 
 
28
  [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) license.
29
 
30
  ## Intended use
 
 
31
 
32
+ The data, code, and model checkpoints are intended to be used solely for (I) future research on visual-language processing and (II) reproducibility of the experimental results reported in the reference paper. The data, code, and model checkpoints are not intended to be used in clinical care or for any clinical decision making purposes.
33
+
34
+ ### Primary Intended Use
35
+
36
+ The primary intended use is to support AI researchers reproducing and building on top of this work. LLaVA-Med and its associated models should be helpful for exploring various biomedical vision-language processing (VLP ) and vision question answering (VQA) research questions.
37
+
38
+ ### Out-of-Scope Use
39
+
40
+ Any deployed use case of the model --- commercial or otherwise --- is out of scope. Although we evaluated the models using a broad set of publicly-available research benchmarks, the models and evaluations are intended for research use only and not intended for deployed use cases. Please refer to [the associated paper](https://aka.ms/llava-med) for more details.
41
+
42
+
43
+ ## Data
44
+
45
+ This model builds upon [PMC-15M dataset](https://aka.ms/biomedclip-paper), which is a large-scale parallel image-text dataset for biomedical vision-language processing. It contains 15 million figure-caption pairs extracted from biomedical research articles in PubMed Central. It covers a diverse range of biomedical image types, such as microscopy, radiography, histology, and more.
46
+
47
+
48
+
49
+ ## How to use
50
 
51
+ See the [Serving](https://github.com/microsoft/LLaVA-Med?tab=readme-ov-file#serving) and [Evaluation](https://github.com/microsoft/LLaVA-Med?tab=readme-ov-file#evaluation) sections in the [LLaVA-Med repo](https://aka.ms/llava-med).
 
 
52
 
53
+ ## Limitations
 
54
 
55
+ This model was developed using English corpora, and thus may be considered English-only. This model is evaluated on a narrow set of biomedical benchmark tasks, described in [LLaVA-Med paper](https://aka.ms/llava-med). As such, it is not suitable for use in any clinical setting. Under some conditions, the model may make inaccurate predictions and display limitations, which may require additional mitigation strategies. In particular, this model is likely to carry many of the limitations of the model from which it is derived, [LLaVA](https://llava-vl.github.io/).
56
 
57
+ Further, this model was developed in part using the [PMC-15M](https://aka.ms/biomedclip-paper) dataset. The figure-caption pairs that make up this dataset may contain biases reflecting the current practice of academic publication. For example, the corresponding papers may be enriched for positive findings, contain examples of extreme cases, and otherwise reflect distributions that are not representative of other sources of biomedical data.
 
58
 
59
 
60
  ### BibTeX entry and citation info