Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,99 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
language:
|
4 |
+
- ko
|
5 |
+
base_model:
|
6 |
+
- klue/bert-base
|
7 |
+
pipeline_tag: feature-extraction
|
8 |
+
tags:
|
9 |
+
- medical
|
10 |
+
---
|
11 |
+
|
12 |
+
# ๐ Korean Medical DPR(Dense Passage Retrieval)
|
13 |
+
|
14 |
+
## 1. Intro
|
15 |
+
**์๋ฃ ๋ถ์ผ**์์ ์ฌ์ฉํ ์ ์๋ Bi-Encoder ๊ตฌ์กฐ์ ๊ฒ์ ๋ชจ๋ธ์
๋๋ค.
|
16 |
+
ํยท์ ํผ์ฉ์ฒด์ ์๋ฃ ๊ธฐ๋ก์ ์ฒ๋ฆฌํ๊ธฐ ์ํด **SapBERT-KO-EN** ์ ๋ฒ ์ด์ค ๋ชจ๋ธ๋ก ์ด์ฉํ์ต๋๋ค.
|
17 |
+
์ง๋ฌธ์ Question Encoder๋ก, ํ
์คํธ๋ Context Encoder๋ฅผ ์ด์ฉํด ์ธ์ฝ๋ฉํฉ๋๋ค.
|
18 |
+
|
19 |
+
- Question Encoder : [https://huggingface.co/snumin44/medical-biencoder-ko-bert-question](https://huggingface.co/snumin44/medical-biencoder-ko-bert-question)
|
20 |
+
|
21 |
+
(โป ์ด ๋ชจ๋ธ์ AI Hub์ [์ด๊ฑฐ๋ AI ํฌ์ค์ผ์ด ์ง์ ์๋ต ๋ฐ์ดํฐ](https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&dataSetSn=71762)๋ก ํ์ตํ ๋ชจ๋ธ์
๋๋ค.)
|
22 |
+
|
23 |
+
|
24 |
+
## 2. Model
|
25 |
+
|
26 |
+
**(1) Self Alignment Pretraining (SAP)**
|
27 |
+
|
28 |
+
ํ๊ตญ ์๋ฃ ๊ธฐ๋ก์ **ํยท์ ํผ์ฉ์ฒด**๋ก ์ฐ์ฌ, ์์ด ์ฉ์ด๋ ์ธ์ํ ์ ์๋ ๋ชจ๋ธ์ด ํ์ํฉ๋๋ค.
|
29 |
+
Multi Similarity Loss๋ฅผ ์ด์ฉํด **๋์ผํ ์ฝ๋์ ์ฉ์ด** ๊ฐ์ ๋์ ์ ์ฌ๋๋ฅผ ๊ฐ๋๋ก ํ์ตํ์ต๋๋ค.
|
30 |
+
```
|
31 |
+
์) C3843080 || ๊ณ ํ์ ์งํ
|
32 |
+
C3843080 || Hypertension
|
33 |
+
C3843080 || High Blood Pressure
|
34 |
+
C3843080 || HTN
|
35 |
+
C3843080 || HBP
|
36 |
+
```
|
37 |
+
|
38 |
+
|
39 |
+
- SapBERT-KO-EN : [https://huggingface.co/snumin44/sap-bert-ko-en](https://huggingface.co/snumin44/sap-bert-ko-en)
|
40 |
+
- Github : [https://github.com/snumin44/SapBERT-KO-EN](https://github.com/snumin44/SapBERT-KO-EN)
|
41 |
+
|
42 |
+
**(2) Dense Passage Retrieval (DPR)**
|
43 |
+
|
44 |
+
SapBERT-KO-EN์ ๊ฒ์ ๋ชจ๋ธ๋ก ๋ง๋ค๊ธฐ ์ํด ์ถ๊ฐ์ ์ธ Fine-tuning์ ํด์ผ ํฉ๋๋ค.
|
45 |
+
Bi-Encoder ๊ตฌ์กฐ๋ก ์ง์์ ํ
์คํธ์ ์ ์ฌ๋๋ฅผ ๊ณ์ฐํ๋ DPR ๋ฐฉ์์ผ๋ก Fine-tuning ํ์ต๋๋ค.
|
46 |
+
๋ค์๊ณผ ๊ฐ์ด ๊ธฐ์กด์ ๋ฐ์ดํฐ ์
์ **ํยท์ ํผ์ฉ์ฒด ์ํ์ ์ฆ๊ฐ**ํ ๋ฐ์ดํฐ ์
์ ์ฌ์ฉํ์ต๋๋ค.
|
47 |
+
```
|
48 |
+
์) ํ๊ตญ์ด ๋ณ๋ช
: ๊ณ ํ์
|
49 |
+
์์ด ๋ณ๋ช
: Hypertenstion
|
50 |
+
์ง์ (์๋ณธ): ์๋ฒ์ง๊ฐ ๊ณ ํ์์ธ๋ฐ ๊ทธ๊ฒ ๋ญ์ง ๋ชจ๋ฅด๊ฒ ์ด. ๊ณ ํ์์ด ๋ญ์ง ์ค๋ช
์ข ํด์ค.
|
51 |
+
์ง์ (์ฆ๊ฐ): ์๋ฒ์ง๊ฐ Hypertenstion ์ธ๋ฐ ๊ทธ๊ฒ ๋ญ์ง ๋ชจ๋ฅด๊ฒ ์ด. Hypertenstion ์ด ๋ญ์ง ์ค๋ช
์ข ํด์ค.
|
52 |
+
```
|
53 |
+
|
54 |
+
- Github : [https://github.com/snumin44/DPR-KO](https://github.com/snumin44/DPR-KO)
|
55 |
+
|
56 |
+
|
57 |
+
## 3. Training
|
58 |
+
|
59 |
+
**(1) Self Alignment Pretraining (SAP)**
|
60 |
+
|
61 |
+
SapBERT-KO-EN ํ์ต์ ํ์ฉํ ๋ฒ ์ด์ค ๋ชจ๋ธ ๋ฐ ํ์ดํผ ํ๋ผ๋ฏธํฐ๋ ๋ค์๊ณผ ๊ฐ์ต๋๋ค.
|
62 |
+
ํยท์ ์๋ฃ ์ฉ์ด๋ฅผ ์๋กํ ์๋ฃ ์ฉ์ด ์ฌ์ ์ธ **KOSTOM**์ ํ์ต ๋ฐ์ดํฐ๋ก ์ฌ์ฉํ์ต๋๋ค.
|
63 |
+
|
64 |
+
- Model : klue/bert-base
|
65 |
+
- Dataset : **KOSTOM**
|
66 |
+
- Epochs : 1
|
67 |
+
- Batch Size : 64
|
68 |
+
- Max Length : 64
|
69 |
+
- Dropout : 0.1
|
70 |
+
- Pooler : 'cls'
|
71 |
+
- Eval Step : 100
|
72 |
+
- Threshold : 0.8
|
73 |
+
- Scale Positive Sample : 1
|
74 |
+
- Scale Negative Sample : 60
|
75 |
+
|
76 |
+
**(2) Dense Passage Retrieval (DPR)**
|
77 |
+
|
78 |
+
Fine-tuning์ ํ์ฉํ ๋ฒ ์ด์ค ๋ชจ๋ธ ๋ฐ ํ์ดํผ ํ๋ผ๋ฏธํฐ๋ ๋ค์๊ณผ ๊ฐ์ต๋๋ค.
|
79 |
+
|
80 |
+
- Model : SapBERT-KO-EN(klue/bert-base)
|
81 |
+
- Dataset : **์ด๊ฑฐ๋ AI ํฌ์ค์ผ์ด ์ง์ ์๋ต ๋ฐ์ดํฐ(AI Hub)**
|
82 |
+
- Epochs : 10
|
83 |
+
- Batch Size : 64
|
84 |
+
- Dropout : 0.1
|
85 |
+
- Pooler : 'cls'
|
86 |
+
|
87 |
+
|
88 |
+
## 4. Example
|
89 |
+
์ด ๋ชจ๋ธ์ ์ง๋ฌธ์ ์ธ์ฝ๋ฉํ๋ ๋ชจ๋ธ๋ก, Context ๋ชจ๋ธ๊ณผ ํจ๊ป ์ฌ์ฉํด์ผ ํฉ๋๋ค.
|
90 |
+
๋์ผํ ์ง๋ณ์ ๊ดํ ์ง๋ฌธ๊ณผ ํ
์คํธ๊ฐ ๋์ ์ ์ฌ๋๋ฅผ ๋ณด์ธ๋ค๋ ์ฌ์ค์ ํ์ธํ ์ ์์ต๋๋ค.
|
91 |
+
|
92 |
+
```python
|
93 |
+
```
|
94 |
+
|
95 |
+
|
96 |
+
## Citing
|
97 |
+
```
|
98 |
+
|
99 |
+
```
|