File size: 6,499 Bytes
c87989b 4b4b2ab 8e6a151 4b4b2ab 8e6a151 4b4b2ab 8e6a151 4b4b2ab 8e6a151 c4b6e48 4b4b2ab 8e6a151 4b4b2ab 8e6a151 4b4b2ab 8e6a151 4b4b2ab 8e6a151 4b4b2ab 8e6a151 4b4b2ab 8e6a151 4b4b2ab 8e6a151 4b4b2ab 8e6a151 4b4b2ab 8e6a151 4b4b2ab 8e6a151 4b4b2ab 8e6a151 4b4b2ab 8e6a151 4b4b2ab 9397287 3774038 7b963be 4b4b2ab 8e6a151 8fd99ab 8e6a151 8fd99ab 8e6a151 4b4b2ab 8e6a151 4b4b2ab 8e6a151 4b4b2ab 8e6a151 4b4b2ab 8e6a151 4b4b2ab 8e6a151 4b4b2ab 8e6a151 4b4b2ab 8e6a151 2535bc8 965ae97 8e6a151 965ae97 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
---
license: llama3
language:
- en
- ja
- zh
base_model:
- meta-llama/Meta-Llama-3-8B
pipeline_tag: text-generation
library_name: transformers
---
# ELAINE-medllm - Build with Llama3-8B
ELAINE (EngLish-jApanese-chINesE)-medLLM is a trilingual (English, Japanese, Chinese) large language mol adapted for the bio-medical domain based on Llama-3-8B.
The training dataset was carefully curated in terms of volume and diversity to adapt to the biomedical domain and endow trilingual capability while preserving the knowledge and abilities of the base model.
The training follows 2-stage paths: continued pre-training and supervised fine-tuning (SFT).
ELAINE-medLLM exhibits superior trilingual capabilities compared to existing bilingual or multilingual medical LLMs without severely sacrificing the base model's capability.
## Model Details
* **Model type**: Please refer to [Llama 3 Github](https://github.com/meta-llama/llama3) for details on the model architecture.
* **Language(s)**: English, Japanese, Chinese
* **Library**: [DeepSpeed](hhttps://github.com/microsoft/DeepSpeed)
* **Tokenizer**: Please refer to [Llama 3 blog](https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md) for details on the tokenizer.
## Model Performance
## Evaluation Benchmarks
The evaluation behchmark dataset and evaluation code can be obtained from [this Github site](https://github.com/aistairc/medLLM_QA_benchmark).
The details of the bechmark are as follows.
### English evaluation benchmarks
- [MedQA](https://arxiv.org/abs/2009.13081)
- [MedQA-4op](https://arxiv.org/abs/2009.13081)
- [MMLU](https://arxiv.org/abs/2009.03300)
- [MedMCQA](https://proceedings.mlr.press/v174/pal22a.html)
- [PubMedQA](https://doi.org/10.18653/v1/D19-1259)
### Japanese evaluation benchmarks
- [IgakuQA](https://github.com/jungokasai/IgakuQA)
- We concatenate the original exam data from 2018 to 2022 into a single JSON file.
- [JJSIMQA](https://arxiv.org/abs/2310.10083)
- DenQA
- It contains the exam problems from the Japan National Dentistry Examination and their answers in the past two years (from 2023 through 2024) extracted from the official website of the Ministry of Health, Labor and Welfare in Japan (https://www.mhlw.go.jp/stf/english/index.html).
### Chinese evaluation benchmarks
- [MedQA](https://arxiv.org/abs/2009.13081)
- [MedQA-4op](https://arxiv.org/abs/2009.13081)
- [CMExam](https://arxiv.org/abs/2306.03030)
## Training Datasets
### Continued pre-training
For continued pretraining, we collected English, Japanese, and Chinese text in the bio-medical domain.
The domain text collected is classified into six categories: 1) scientific papers, 2) medical guidelines, 3) web text related to biomedical, 4) textbook of biomedical, 5) PubMed abstracts, and 6) PubMed Central (PMC) archives.
For the Japanese PubMed abstract, we used the original English PubMed abstract translated in Japanese.
We used only open-licensed text except for the Japanese biomedical papers from [J-STAGE](https://www.jstage.jst.go.jp/browse/-char/en).
### Instruction supervised fine-tuning
We collected various conversational QA datasets in the bio-medical domain from different data sources.
For English, we used Medical Meadow in MedAlpca, HealthCareMagic, and iClilic dataset used in ChatDoctor.
We adapted the augmented QA dataset from HuatuoGPT-2 for Chinese and English.
For Japanese, we used existing alpaca datasets in the general domain translated in Japanese.
### Results
## English benchmark
| model_name | MMLU | MedMCQA | MedQA | MedQA-4op | PubMedQA | Avg |
|---------------------------------------|--------|---------|--------|-----------|----------|--------|
| google_gemma-7b | 63.65 | 49.81 | 43.38 | 48.82 | 71.52 | 55.44 |
| meta-llama_Llama-2-7b-hf | 45.02 | 36.84 | 30.13 | 36.59 | 49.90 | 39.70 |
| meta-llama_Meta-Llama-3-8B | 71.22 | 56.97 | 52.60 | 57.89 | 69.70 | 61.68 |
| tokyotech-llm_Llama-3-Swallow-8B-v0.1 | 65.96 | 51.27 | 45.90 | 52.92 | 61.01 | 55.41 |
| Llama3-ELAINE-medLLM-8B | 67.80 | 54.55 | 50.47 | 57.73 | 67.27 | 59.56 |
## Japanese benchmark
| model_name | DenQA | IgakuQA | JJSIMQA | Avg |
|---------------------------------------|--------|---------|---------|--------|
| google_gemma-7b | 18.60 | 29.02 | 18.90 | 22.17 |
| meta-llama_Llama-2-7b-hf | 10.63 | 17.64 | 8.13 | 12.13 |
| meta-llama_Meta-Llama-3-8B | 18.88 | 35.09 | 23.52 | 25.83 |
| tokyotech-llm_Llama-3-Swallow-8B-v0.1 | 22.24 | 42.21 | 27.25 | 30.57 |
| Llama3-ELAINE-medLLM-8B | 22.38 | 44.06 | 29.45 | 31.96 |
## Chinese benchmark
| model_name | CMExam | MedQA | MedQA-4op | Avg |
|---------------------------------------|--------|--------|-----------|--------|
| google_gemma-7b | 36.34 | 40.54 | 43.03 | 39.97 |
| meta-llama_Llama-2-7b-hf | 24.33 | 25.02 | 29.61 | 26.32 |
| meta-llama_Meta-Llama-3-8B | 40.30 | 44.96 | 51.15 | 45.47 |
| tokyotech-llm_Llama-3-Swallow-8B-v0.1 | 36.19 | 40.89 | 48.00 | 41.69 |
| Llama3-ELAINE-medLLM-8B | 46.03 | 52.50 | 58.23 | 52.25 |
## Risks and Limitations
The models released here are still in the early stages of our research and development and have not been tuned to ensure outputs align with human intent and safety considerations.
## Acknowledgements
We thank Meta Research for releasing Llama 3 under a generous open license.
## Authors
- Ken Yano
- Zheheng Luo
- Jimin Huang
- Qianqian Xie
- Masaki Asada
- Chenhan Yuan
- Kailai Yang
- Makoto Miwa
- Sophia Ananiadou
- Jun'ichi Tsujii
## Contact
- Ken Yano [[email protected]]
## How to cite
If you find our work helpful, please feel free to cite these papers.
```
@article{published_papers/48577159,
title = {ELAINE-medLLM: Lightweight English Japanese Chinese Trilingual Large Language Model for Bio-medical Domain (To appear)},
author = {Ken Yano and Zheheng Luo and Jimin Huang and Qianqian Xie and Masaki Asada and Chenhan Yuan and Kailai Yang and Makoto Miwa and Sophia Ananiadou and Jun'ichi Tsujii},
journal = {The 31st International Conference on Computational Linguistics (COLING 2025)},
month = {1},
year = {2025}
}
```
|