snumin44 commited on
Commit
51f62ca
ยท
verified ยท
1 Parent(s): 17c997c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +99 -0
README.md ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - ko
5
+ base_model:
6
+ - klue/bert-base
7
+ pipeline_tag: feature-extraction
8
+ tags:
9
+ - medical
10
+ ---
11
+
12
+ # ๐ŸŠ Korean Medical DPR(Dense Passage Retrieval)
13
+
14
+ ## 1. Intro
15
+ **์˜๋ฃŒ ๋ถ„์•ผ**์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” Bi-Encoder ๊ตฌ์กฐ์˜ ๊ฒ€์ƒ‰ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
16
+ ํ•œยท์˜ ํ˜ผ์šฉ์ฒด์˜ ์˜๋ฃŒ ๊ธฐ๋ก์„ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด **SapBERT-KO-EN** ์„ ๋ฒ ์ด์Šค ๋ชจ๋ธ๋กœ ์ด์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.
17
+ ์งˆ๋ฌธ์€ Question Encoder๋กœ, ํ…์ŠคํŠธ๋Š” Context Encoder๋ฅผ ์ด์šฉํ•ด ์ธ์ฝ”๋”ฉํ•ฉ๋‹ˆ๋‹ค.
18
+
19
+ - Question Encoder : [https://huggingface.co/snumin44/medical-biencoder-ko-bert-question](https://huggingface.co/snumin44/medical-biencoder-ko-bert-question)
20
+
21
+ (โ€ป ์ด ๋ชจ๋ธ์€ AI Hub์˜ [์ดˆ๊ฑฐ๋Œ€ AI ํ—ฌ์Šค์ผ€์–ด ์งˆ์˜ ์‘๋‹ต ๋ฐ์ดํ„ฐ](https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&dataSetSn=71762)๋กœ ํ•™์Šตํ•œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.)
22
+
23
+
24
+ ## 2. Model
25
+
26
+ **(1) Self Alignment Pretraining (SAP)**
27
+
28
+ ํ•œ๊ตญ ์˜๋ฃŒ ๊ธฐ๋ก์€ **ํ•œยท์˜ ํ˜ผ์šฉ์ฒด**๋กœ ์“ฐ์—ฌ, ์˜์–ด ์šฉ์–ด๋„ ์ธ์‹ํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
29
+ Multi Similarity Loss๋ฅผ ์ด์šฉํ•ด **๋™์ผํ•œ ์ฝ”๋“œ์˜ ์šฉ์–ด** ๊ฐ„์— ๋†’์€ ์œ ์‚ฌ๋„๋ฅผ ๊ฐ–๋„๋ก ํ•™์Šตํ–ˆ์Šต๋‹ˆ๋‹ค.
30
+ ```
31
+ ์˜ˆ) C3843080 || ๊ณ ํ˜ˆ์•• ์งˆํ™˜
32
+ C3843080 || Hypertension
33
+ C3843080 || High Blood Pressure
34
+ C3843080 || HTN
35
+ C3843080 || HBP
36
+ ```
37
+
38
+
39
+ - SapBERT-KO-EN : [https://huggingface.co/snumin44/sap-bert-ko-en](https://huggingface.co/snumin44/sap-bert-ko-en)
40
+ - Github : [https://github.com/snumin44/SapBERT-KO-EN](https://github.com/snumin44/SapBERT-KO-EN)
41
+
42
+ **(2) Dense Passage Retrieval (DPR)**
43
+
44
+ SapBERT-KO-EN์„ ๊ฒ€์ƒ‰ ๋ชจ๋ธ๋กœ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด ์ถ”๊ฐ€์ ์ธ Fine-tuning์„ ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
45
+ Bi-Encoder ๊ตฌ์กฐ๋กœ ์งˆ์˜์™€ ํ…์ŠคํŠธ์˜ ์œ ์‚ฌ๋„๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” DPR ๋ฐฉ์‹์œผ๋กœ Fine-tuning ํ–ˆ์Šต๋‹ˆ๋‹ค.
46
+ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ธฐ์กด์˜ ๋ฐ์ดํ„ฐ ์…‹์— **ํ•œยท์˜ ํ˜ผ์šฉ์ฒด ์ƒ˜ํ”Œ์„ ์ฆ๊ฐ•**ํ•œ ๋ฐ์ดํ„ฐ ์…‹์„ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.
47
+ ```
48
+ ์˜ˆ) ํ•œ๊ตญ์–ด ๋ณ‘๋ช…: ๊ณ ํ˜ˆ์••
49
+ ์˜์–ด ๋ณ‘๋ช…: Hypertenstion
50
+ ์งˆ์˜ (์›๋ณธ): ์•„๋ฒ„์ง€๊ฐ€ ๊ณ ํ˜ˆ์••์ธ๋ฐ ๊ทธ๊ฒŒ ๋ญ”์ง€ ๋ชจ๋ฅด๊ฒ ์–ด. ๊ณ ํ˜ˆ์••์ด ๋ญ”์ง€ ์„ค๋ช…์ข€ ํ•ด์ค˜.
51
+ ์งˆ์˜ (์ฆ๊ฐ•): ์•„๋ฒ„์ง€๊ฐ€ Hypertenstion ์ธ๋ฐ ๊ทธ๊ฒŒ ๋ญ”์ง€ ๋ชจ๋ฅด๊ฒ ์–ด. Hypertenstion ์ด ๋ญ”์ง€ ์„ค๋ช…์ข€ ํ•ด์ค˜.
52
+ ```
53
+
54
+ - Github : [https://github.com/snumin44/DPR-KO](https://github.com/snumin44/DPR-KO)
55
+
56
+
57
+ ## 3. Training
58
+
59
+ **(1) Self Alignment Pretraining (SAP)**
60
+
61
+ SapBERT-KO-EN ํ•™์Šต์— ํ™œ์šฉํ•œ ๋ฒ ์ด์Šค ๋ชจ๋ธ ๋ฐ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
62
+ ํ•œยท์˜ ์˜๋ฃŒ ์šฉ์–ด๋ฅผ ์ˆ˜๋กํ•œ ์˜๋ฃŒ ์šฉ์–ด ์‚ฌ์ „์ธ **KOSTOM**์„ ํ•™์Šต ๋ฐ์ดํ„ฐ๋กœ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.
63
+
64
+ - Model : klue/bert-base
65
+ - Dataset : **KOSTOM**
66
+ - Epochs : 1
67
+ - Batch Size : 64
68
+ - Max Length : 64
69
+ - Dropout : 0.1
70
+ - Pooler : 'cls'
71
+ - Eval Step : 100
72
+ - Threshold : 0.8
73
+ - Scale Positive Sample : 1
74
+ - Scale Negative Sample : 60
75
+
76
+ **(2) Dense Passage Retrieval (DPR)**
77
+
78
+ Fine-tuning์— ํ™œ์šฉํ•œ ๋ฒ ์ด์Šค ๋ชจ๋ธ ๋ฐ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
79
+
80
+ - Model : SapBERT-KO-EN(klue/bert-base)
81
+ - Dataset : **์ดˆ๊ฑฐ๋Œ€ AI ํ—ฌ์Šค์ผ€์–ด ์งˆ์˜ ์‘๋‹ต ๋ฐ์ดํ„ฐ(AI Hub)**
82
+ - Epochs : 10
83
+ - Batch Size : 64
84
+ - Dropout : 0.1
85
+ - Pooler : 'cls'
86
+
87
+
88
+ ## 4. Example
89
+ ์ด ๋ชจ๋ธ์€ ์งˆ๋ฌธ์„ ์ธ์ฝ”๋”ฉํ•˜๋Š” ๋ชจ๋ธ๋กœ, Context ๋ชจ๋ธ๊ณผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
90
+ ๋™์ผํ•œ ์งˆ๋ณ‘์— ๊ด€ํ•œ ์งˆ๋ฌธ๊ณผ ํ…์ŠคํŠธ๊ฐ€ ๋†’์€ ์œ ์‚ฌ๋„๋ฅผ ๋ณด์ธ๋‹ค๋Š” ์‚ฌ์‹ค์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
91
+
92
+ ```python
93
+ ```
94
+
95
+
96
+ ## Citing
97
+ ```
98
+
99
+ ```