snumin44 commited on
Commit
e06d56a
ยท
verified ยท
1 Parent(s): 51f62ca

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +66 -2
README.md CHANGED
@@ -87,13 +87,77 @@ Fine-tuning์— ํ™œ์šฉํ•œ ๋ฒ ์ด์Šค ๋ชจ๋ธ ๋ฐ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ๋‹ค์Œ
87
 
88
  ## 4. Example
89
  ์ด ๋ชจ๋ธ์€ ์งˆ๋ฌธ์„ ์ธ์ฝ”๋”ฉํ•˜๋Š” ๋ชจ๋ธ๋กœ, Context ๋ชจ๋ธ๊ณผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
90
- ๋™์ผํ•œ ์งˆ๋ณ‘์— ๊ด€ํ•œ ์งˆ๋ฌธ๊ณผ ํ…์ŠคํŠธ๊ฐ€ ๋†’์€ ์œ ์‚ฌ๋„๋ฅผ ๋ณด์ธ๋‹ค๋Š” ์‚ฌ์‹ค์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
 
 
 
91
 
92
  ```python
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93
  ```
94
 
95
 
96
  ## Citing
97
  ```
98
-
 
 
 
 
 
 
 
 
 
 
 
 
 
99
  ```
 
87
 
88
  ## 4. Example
89
  ์ด ๋ชจ๋ธ์€ ์งˆ๋ฌธ์„ ์ธ์ฝ”๋”ฉํ•˜๋Š” ๋ชจ๋ธ๋กœ, Context ๋ชจ๋ธ๊ณผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
90
+ ๋™์ผํ•œ ์งˆ๋ณ‘์— ๊ด€ํ•œ ์งˆ๋ฌธ๊ณผ ํ…์ŠคํŠธ๊ฐ€ ๋†’์€ ์œ ์‚ฌ๋„๋ฅผ ๋ณด์ธ๋‹ค๋Š” ์‚ฌ์‹ค์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
91
+
92
+ (โ€ป ์•„๋ž˜ ์ฝ”๋“œ์˜ ์˜ˆ์‹œ๋Š” ChatGPT๋ฅผ ์ด์šฉํ•ด ์ƒ์„ฑํ•œ ์˜๋ฃŒ ํ…์ŠคํŠธ์ž…๋‹ˆ๋‹ค.)
93
+ (โ€ป ํ•™์Šต ๋ฐ์ดํ„ฐ์˜ ํŠน์„ฑ ์ƒ ์˜ˆ์‹œ ๋ณด๋‹ค ์ •์ œ๋œ ํ…์ŠคํŠธ์— ๋Œ€ํ•ด ๋” ์ž˜ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.)
94
 
95
  ```python
96
+ import numpy as np
97
+ from transformers import AutoModel, AutoTokenizer
98
+
99
+ # Question Model
100
+ q_model_path = 'snumin44/medical-biencoder-ko-bert-question'
101
+ q_model = AutoModel.from_pretrained(q_model_path)
102
+ q_tokenizer = AutoTokenizer.from_pretrained(q_model_path)
103
+
104
+ # Context Model
105
+ c_model_path = 'snumin44/medical-biencoder-ko-bert-context'
106
+ c_model = AutoModel.from_pretrained(c_model_path)
107
+ c_tokenizer = AutoTokenizer.from_pretrained(c_model_path)
108
+
109
+
110
+ query = 'high blood pressure ์ฒ˜๋ฐฉ ์‚ฌ๋ก€'
111
+
112
+ targets = [
113
+ """๊ณ ํ˜ˆ์•• ์ง„๋‹จ.
114
+ ํ™˜์ž ์ƒ๋‹ด ๋ฐ ์ƒํ™œ์Šต๊ด€ ๊ต์ • ๊ถŒ๊ณ . ์ €์—ผ์‹, ๊ทœ์น™์ ์ธ ์šด๋™, ๊ธˆ์—ฐ, ๊ธˆ์ฃผ ์ง€์‹œ.
115
+ ํ™˜์ž ์žฌ๋ฐฉ๋ฌธ. ํ˜ˆ์••: 150/95mmHg. ์•ฝ๋ฌผ์น˜๋ฃŒ ์‹œ์ž‘. Amlodipine 5mg 1์ผ 1ํšŒ ์ฒ˜๋ฐฉ.""",
116
+
117
+ """์‘๊ธ‰์‹ค ๋„์ฐฉ ํ›„ ์œ„ ๋‚ด์‹œ๊ฒฝ ์ง„ํ–‰.
118
+ ์†Œ๊ฒฌ: Gastric ulcer์—์„œ Forrest IIb ๊ด€์ฐฐ๋จ. ์ถœํ˜ˆ์€ ์†Œ๋Ÿ‰์˜ ์‚ผ์ถœ์„ฑ ์ถœํ˜ˆ ํ˜•ํƒœ.
119
+ ์ฒ˜์น˜: ์—ํ”ผ๋„คํ”„๋ฆฐ ์ฃผ์‚ฌ๋กœ ์ถœํ˜ˆ ๊ฐ์†Œ ํ™•์ธ. Hemoclip 2๊ฐœ๋กœ ์ถœํ˜ˆ ๋ถ€์œ„ ํด๋ฆฌํ•‘ํ•˜์—ฌ ์ง€ํ˜ˆ ์™„๋ฃŒ.""",
120
+
121
+ """ํ˜ˆ์ค‘ ๋†’์€ ์ง€๋ฐฉ ์ˆ˜์น˜ ๋ฐ ์ง€๋ฐฉ๊ฐ„ ์†Œ๊ฒฌ.
122
+ ๋‹ค๋ฐœ์„ฑ gallstones ํ™•์ธ. ์ฆ์ƒ ์—†์„ ๊ฒฝ์šฐ ๊ฒฝ๊ณผ ๊ด€์ฐฐ ๊ถŒ์žฅ.
123
+ ์šฐ์ธก renal cyst, ์–‘์„ฑ ๊ฐ€๋Šฅ์„ฑ ๋†’์œผ๋ฉฐ ์ถ”๊ฐ€์ ์ธ ์ฒ˜์น˜ ๋ถˆํ•„์š” ํ•จ."""
124
+ ]
125
+
126
+ query_feature = q_tokenizer(query, return_tensors='pt')
127
+ query_outputs = q_model(**query_feature, return_dict=True)
128
+ query_embeddings = query_outputs.pooler_output.detach().numpy().squeeze()
129
+
130
+ def cos_sim(A, B):
131
+ return np.dot(A, B) / (np.linalg.norm(A) * np.linalg.norm(B))
132
+
133
+ for idx, target in enumerate(targets):
134
+ target_feature = c_tokenizer(target, return_tensors='pt')
135
+ target_outputs = c_model(**target_feature, return_dict=True)
136
+ target_embeddings = target_outputs.pooler_output.detach().numpy().squeeze()
137
+ similarity = cos_sim(query_embeddings, target_embeddings)
138
+ print(f"Similarity between query and target {idx}: {similarity:.4f}")
139
+ ```
140
+ ```
141
+ Similarity between query and target 0: 0.2674
142
+ Similarity between query and target 1: 0.0416
143
+ Similarity between query and target 2: 0.0476
144
  ```
145
 
146
 
147
  ## Citing
148
  ```
149
+ @inproceedings{liu2021self,
150
+ title={Self-Alignment Pretraining for Biomedical Entity Representations},
151
+ author={Liu, Fangyu and Shareghi, Ehsan and Meng, Zaiqiao and Basaldella, Marco and Collier, Nigel},
152
+ booktitle={Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
153
+ pages={4228--4238},
154
+ month = jun,
155
+ year={2021}
156
+ }
157
+ @article{karpukhin2020dense,
158
+ title={Dense Passage Retrieval for Open-Domain Question Answering},
159
+ author={Vladimir Karpukhin, Barlas OฤŸuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih},
160
+ journal={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
161
+ year={2020}
162
+ }
163
  ```