Update README.md
Browse files
README.md
CHANGED
@@ -87,13 +87,77 @@ Fine-tuning์ ํ์ฉํ ๋ฒ ์ด์ค ๋ชจ๋ธ ๋ฐ ํ์ดํผ ํ๋ผ๋ฏธํฐ๋ ๋ค์
|
|
87 |
|
88 |
## 4. Example
|
89 |
์ด ๋ชจ๋ธ์ ์ง๋ฌธ์ ์ธ์ฝ๋ฉํ๋ ๋ชจ๋ธ๋ก, Context ๋ชจ๋ธ๊ณผ ํจ๊ป ์ฌ์ฉํด์ผ ํฉ๋๋ค.
|
90 |
-
๋์ผํ ์ง๋ณ์ ๊ดํ ์ง๋ฌธ๊ณผ ํ
์คํธ๊ฐ ๋์ ์ ์ฌ๋๋ฅผ ๋ณด์ธ๋ค๋ ์ฌ์ค์ ํ์ธํ ์ ์์ต๋๋ค.
|
|
|
|
|
|
|
91 |
|
92 |
```python
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
93 |
```
|
94 |
|
95 |
|
96 |
## Citing
|
97 |
```
|
98 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
99 |
```
|
|
|
87 |
|
88 |
## 4. Example
|
89 |
์ด ๋ชจ๋ธ์ ์ง๋ฌธ์ ์ธ์ฝ๋ฉํ๋ ๋ชจ๋ธ๋ก, Context ๋ชจ๋ธ๊ณผ ํจ๊ป ์ฌ์ฉํด์ผ ํฉ๋๋ค.
|
90 |
+
๋์ผํ ์ง๋ณ์ ๊ดํ ์ง๋ฌธ๊ณผ ํ
์คํธ๊ฐ ๋์ ์ ์ฌ๋๋ฅผ ๋ณด์ธ๋ค๋ ์ฌ์ค์ ํ์ธํ ์ ์์ต๋๋ค.
|
91 |
+
|
92 |
+
(โป ์๋ ์ฝ๋์ ์์๋ ChatGPT๋ฅผ ์ด์ฉํด ์์ฑํ ์๋ฃ ํ
์คํธ์
๋๋ค.)
|
93 |
+
(โป ํ์ต ๋ฐ์ดํฐ์ ํน์ฑ ์ ์์ ๋ณด๋ค ์ ์ ๋ ํ
์คํธ์ ๋ํด ๋ ์ ์๋ํฉ๋๋ค.)
|
94 |
|
95 |
```python
|
96 |
+
import numpy as np
|
97 |
+
from transformers import AutoModel, AutoTokenizer
|
98 |
+
|
99 |
+
# Question Model
|
100 |
+
q_model_path = 'snumin44/medical-biencoder-ko-bert-question'
|
101 |
+
q_model = AutoModel.from_pretrained(q_model_path)
|
102 |
+
q_tokenizer = AutoTokenizer.from_pretrained(q_model_path)
|
103 |
+
|
104 |
+
# Context Model
|
105 |
+
c_model_path = 'snumin44/medical-biencoder-ko-bert-context'
|
106 |
+
c_model = AutoModel.from_pretrained(c_model_path)
|
107 |
+
c_tokenizer = AutoTokenizer.from_pretrained(c_model_path)
|
108 |
+
|
109 |
+
|
110 |
+
query = 'high blood pressure ์ฒ๋ฐฉ ์ฌ๋ก'
|
111 |
+
|
112 |
+
targets = [
|
113 |
+
"""๊ณ ํ์ ์ง๋จ.
|
114 |
+
ํ์ ์๋ด ๋ฐ ์ํ์ต๊ด ๊ต์ ๊ถ๊ณ . ์ ์ผ์, ๊ท์น์ ์ธ ์ด๋, ๊ธ์ฐ, ๊ธ์ฃผ ์ง์.
|
115 |
+
ํ์ ์ฌ๋ฐฉ๋ฌธ. ํ์: 150/95mmHg. ์ฝ๋ฌผ์น๋ฃ ์์. Amlodipine 5mg 1์ผ 1ํ ์ฒ๋ฐฉ.""",
|
116 |
+
|
117 |
+
"""์๊ธ์ค ๋์ฐฉ ํ ์ ๋ด์๊ฒฝ ์งํ.
|
118 |
+
์๊ฒฌ: Gastric ulcer์์ Forrest IIb ๊ด์ฐฐ๋จ. ์ถํ์ ์๋์ ์ผ์ถ์ฑ ์ถํ ํํ.
|
119 |
+
์ฒ์น: ์ํผ๋คํ๋ฆฐ ์ฃผ์ฌ๋ก ์ถํ ๊ฐ์ ํ์ธ. Hemoclip 2๊ฐ๋ก ์ถํ ๋ถ์ ํด๋ฆฌํํ์ฌ ์งํ ์๋ฃ.""",
|
120 |
+
|
121 |
+
"""ํ์ค ๋์ ์ง๋ฐฉ ์์น ๋ฐ ์ง๋ฐฉ๊ฐ ์๊ฒฌ.
|
122 |
+
๋ค๋ฐ์ฑ gallstones ํ์ธ. ์ฆ์ ์์ ๊ฒฝ์ฐ ๊ฒฝ๊ณผ ๊ด์ฐฐ ๊ถ์ฅ.
|
123 |
+
์ฐ์ธก renal cyst, ์์ฑ ๊ฐ๋ฅ์ฑ ๋์ผ๋ฉฐ ์ถ๊ฐ์ ์ธ ์ฒ์น ๋ถํ์ ํจ."""
|
124 |
+
]
|
125 |
+
|
126 |
+
query_feature = q_tokenizer(query, return_tensors='pt')
|
127 |
+
query_outputs = q_model(**query_feature, return_dict=True)
|
128 |
+
query_embeddings = query_outputs.pooler_output.detach().numpy().squeeze()
|
129 |
+
|
130 |
+
def cos_sim(A, B):
|
131 |
+
return np.dot(A, B) / (np.linalg.norm(A) * np.linalg.norm(B))
|
132 |
+
|
133 |
+
for idx, target in enumerate(targets):
|
134 |
+
target_feature = c_tokenizer(target, return_tensors='pt')
|
135 |
+
target_outputs = c_model(**target_feature, return_dict=True)
|
136 |
+
target_embeddings = target_outputs.pooler_output.detach().numpy().squeeze()
|
137 |
+
similarity = cos_sim(query_embeddings, target_embeddings)
|
138 |
+
print(f"Similarity between query and target {idx}: {similarity:.4f}")
|
139 |
+
```
|
140 |
+
```
|
141 |
+
Similarity between query and target 0: 0.2674
|
142 |
+
Similarity between query and target 1: 0.0416
|
143 |
+
Similarity between query and target 2: 0.0476
|
144 |
```
|
145 |
|
146 |
|
147 |
## Citing
|
148 |
```
|
149 |
+
@inproceedings{liu2021self,
|
150 |
+
title={Self-Alignment Pretraining for Biomedical Entity Representations},
|
151 |
+
author={Liu, Fangyu and Shareghi, Ehsan and Meng, Zaiqiao and Basaldella, Marco and Collier, Nigel},
|
152 |
+
booktitle={Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
|
153 |
+
pages={4228--4238},
|
154 |
+
month = jun,
|
155 |
+
year={2021}
|
156 |
+
}
|
157 |
+
@article{karpukhin2020dense,
|
158 |
+
title={Dense Passage Retrieval for Open-Domain Question Answering},
|
159 |
+
author={Vladimir Karpukhin, Barlas Oฤuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih},
|
160 |
+
journal={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
|
161 |
+
year={2020}
|
162 |
+
}
|
163 |
```
|