Kevin Wu commited on
Commit
9ae2c40
0 Parent(s):
Files changed (4) hide show
  1. README.md +1 -0
  2. prompts.py +505 -0
  3. requirements.txt +3 -0
  4. run_extraction.py +202 -0
README.md ADDED
@@ -0,0 +1 @@
 
 
1
+ A note extraction app hosted on Hugging Face Spaces.
prompts.py ADDED
@@ -0,0 +1,505 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ info_prompt = """For each clinical note, extract the following fields into a structured XML format.
2
+ For each of the following fields, return the value if it exists in the notes, otherwise do not return anything between tags.
3
+ For example, if the notes mention that the patient's name is "John Doe", then the output should be <patient_name>John Doe</patient_name>.
4
+ Otherwise, if the notes do not mention the patient's name, then do not return anything between <patient_name> and </patient_name>.
5
+ Additionally, add a <reasoning> tag with the reasoning for why you chose the value you did. This reasoning should be specific to the note and the patient, and if a field is found, it should contain a brief verbatim quote from the notes that is used to justify the value.
6
+
7
+ - patient_name
8
+ - The patient's full name (first and last name).
9
+ - For example:
10
+ <patient_name>
11
+ <reasoning>[REASONING]</reasoning>
12
+ <first_name>John</first_name>
13
+ <last_name>Doe</last_name>
14
+ </patient_name>
15
+ - date_of_birth
16
+ - The patient's date of birth in the format YYYY-MM-DD.
17
+ - For example:
18
+ <date_of_birth>
19
+ <reasoning>[REASONING]</reasoning>
20
+ <date>1990-01-01</date>
21
+ </date_of_birth>
22
+ - sex
23
+ - The patient's sex (M or F).
24
+ - For example:
25
+ <sex>
26
+ <reasoning>[REASONING]</reasoning>
27
+ <sex>M</sex>
28
+ </sex>
29
+ - traditional_chemo
30
+ - Any traditional chemotherapy drugs the patient has taken or has been prescribed.
31
+ - Within the tags, put a list of all the traditional chemotherapy drugs the patient has taken or has been prescribed as well as the date, if specified.
32
+ - For example:
33
+ <traditional_chemo_any_time>
34
+ <reasoning>[REASONING]</reasoning>
35
+ <drug>Doxorubicin/Adriamycin</drug>
36
+ <date>2021-01-01</date>
37
+ </traditional_chemo_any_time>
38
+ - The following are the traditional chemotherapy drugs you should look for:
39
+ - Doxorubicin/Adriamycin
40
+ - Carboplatin
41
+ - Vinblastine
42
+ - Chlorambucil/Leukeran
43
+ - Lomustine/CCNU
44
+ - Mitoxantrone
45
+ - Cyclophosphamide/Cytoxan
46
+ - Vinorelbine
47
+ - Vincristine
48
+ - CHOP protocol
49
+ - Actinomycin
50
+ - VAC Protocol
51
+ - Tanovea
52
+ - L-asparaginase
53
+ - Melphalan
54
+ - Satraplatin
55
+ - Epirubicin
56
+ - Neoplasene
57
+ - MOPP Chemotherapy
58
+ - Satraplatin metronomic
59
+ - Gemcitabine/Gemzar
60
+ - Fluorouracil (5-FU)
61
+ - Laverdia
62
+ - Temodar
63
+ - Unspecified traditional chemo
64
+ - No Traditional Chemo to-date
65
+ - No traditional chemo reported - validated
66
+ - Tamozolamide/Temodar
67
+ - Unknown
68
+ - Cisplatin
69
+ - Mustargen
70
+ - Procarbazine
71
+ - Mitotane
72
+
73
+ - other_cancer_treatments
74
+ - Any other cancer treatments the patient has taken or has been prescribed, according to the list provided below.
75
+ - Within the tags, put a list of all the other cancer treatments the patient has taken or has been prescribed as well as the date, if specified.
76
+ - For example:
77
+ <other_cancer_treatments>
78
+ <reasoning>[REASONING]</reasoning>
79
+ <treatment>Radiation Therapy</treatment>
80
+ <date>2021-01-01</date>
81
+ </other_cancer_treatments>
82
+ - The following are the other cancer treatments you should look for:
83
+ - Radiation Therapy
84
+ - Palladia/Toceranib
85
+ - Melanoma Vaccine (Oncept)
86
+ - Electrochemotherapy
87
+ - Masatinib
88
+ - Autologous Vaccine/Torigen/Ardent
89
+ - Lapatinib
90
+ - I'm Yunity
91
+ - Yunnan Baiyao
92
+ - Previcox
93
+ - Yale Vaccine
94
+ - Rapamycin
95
+ - Listeria Vaccine
96
+ - Imatinib
97
+ - Trametinib
98
+ - Zoledronate
99
+ - Dexrazoxane/Zinecard
100
+ - Firocoxib
101
+ - Olaparib
102
+ - Dasatinib
103
+ - Vorinostat
104
+ - Mistletoe Therapy
105
+ - EGFR Vaccine
106
+ - No other cancer treatments reported - validated
107
+ - Palbociclib
108
+ - No other cancer treatments reported to-date
109
+ - Sorafenib
110
+ - Nanoparticle Infusion
111
+ - Nanoparticle Laser
112
+ - Laser Therapy
113
+ - Stelfonta
114
+ - Tanovea
115
+ - T-Cell infusions
116
+ - Losartan
117
+ - Naltrexone
118
+ - Immunoregulin
119
+ - Papilloma Vaccine
120
+ - Gilvetmab
121
+ - Unknown
122
+
123
+ - other_conmeds
124
+ - Any other concomitant medications the patient has taken or has been prescribed.
125
+ - Within the tags, put a list of all the other concomitant medications the patient has taken or has been prescribed as well as the date, if specified.
126
+ - For example:
127
+ <other_conmeds>
128
+ <reasoning>[REASONING]</reasoning>
129
+ <medication>Aspirin</medication>
130
+ <date>2021-01-01</date>
131
+ </other_conmeds>
132
+ - The following are the other concomitant medications you should look for:
133
+ - Piroxicam
134
+ - Gabapentin
135
+ - Carprofen/Rimadyl
136
+ - Denamarin
137
+ - Ursodiol
138
+ - Clavamox
139
+ - Cerenia/Maropitant
140
+ - Ondansetron/Zofran
141
+ - Meloxicam/Metacam
142
+ - Pimobendan/Vetmedin
143
+ - Losartan/Cozaar
144
+ - Capromorelin/Entyce
145
+ - Cetirizine/Zyrtec
146
+ - Tacrolimus
147
+ - Codeine
148
+ - Telmisartan
149
+ - Buprenorphine
150
+ - Apoquel/Oclacitinib
151
+ - Imuquin
152
+ - Amlodipine
153
+ - Loratadine/Claritin
154
+ - Benazepril
155
+ - Metronidazole/Flagyl
156
+ - Prednisone
157
+ - Adequan
158
+ - Convenia
159
+ - B12 Injections
160
+ - Cisapride
161
+ - Budesonide
162
+ - Hepatoclear
163
+ - Dasaquin
164
+ - Cytopoint Injections
165
+ - Glucosamine
166
+ - Famotidine/Pepcid
167
+ - Fish Oil
168
+ - Omeprazole/Prilosec
169
+ - Mirtazapine
170
+ - Meclizine
171
+ - Amantadine
172
+ - Cortisone
173
+ - Pentoxifylline
174
+ - Ligaplex
175
+ - Reishi Mushroom
176
+ - Immune Builder
177
+ - CAS Multimushroom
178
+ - Ketamine Injections
179
+ - Vitamin E
180
+ - Trazadone
181
+ - Phenobarbitol
182
+ - Tylan Powder
183
+ - Temaril-P
184
+ - Acepromazine
185
+ - Sulfasalazine
186
+ - Keppra
187
+ - Turkey Tail Mushroom
188
+ - Furosemide/Lasix
189
+ - Tramadol
190
+ - Ciprofloxacin
191
+ - Trilostane
192
+ - Naturvet Vitapet Vitamins
193
+ - Glycoflex
194
+ - Entederm
195
+ - Aluminum Hydroxide
196
+ - Deramaxx
197
+ - Doxycycline
198
+ - Sulcrafate
199
+ - Diphenhydramine
200
+ - Fluoxetine
201
+ - Nexgard
202
+ - Reglan/Metoclopramide
203
+ - Thyroxine
204
+ - Clindamycin
205
+ - Cephalexin
206
+ - Enalapril
207
+ - CBD
208
+ - Denosyl
209
+ - Galliprant
210
+ - Methadone
211
+ - Cobalequin
212
+ - Azodyl
213
+ - FortiFlora
214
+ - Propectalin Paste
215
+ - Dexamethasone
216
+ - Ampicillin
217
+ - Coriolus mushroom
218
+ - Oxycodone
219
+ - Cyproheptadine
220
+ - Sotalol
221
+ - Enrofloxacin/Baytril
222
+ - Amiikacin
223
+ - Misoprostol
224
+ - Chlorhexadine
225
+ - Neomycin
226
+ - Visbiome
227
+ - Tranexamic Acid
228
+ - Proin
229
+ - Tobramycin
230
+ - Avmaquin
231
+ - Cosyntropin (Cortosyn)
232
+ - Vetoryl
233
+ - Metoclopromide
234
+ - Phenylpropanolamine HCl
235
+ - Cosequin
236
+ - Osteoflex
237
+ - Hepato TruBenefits
238
+ - Rx Clay
239
+ - Metamucil Powder
240
+ - Osteo-Tru Benefits
241
+ - Oat Glycerite
242
+ - Cholodin
243
+ - Proviable
244
+ - Supplements
245
+ - Wuffles Joint Supplement
246
+ - Firocoxib/Previcox
247
+ - Tylosin
248
+ - Barium Suspension
249
+ - Optimmune Ointment
250
+ - NeoPolyDex Solution
251
+ - Endosorb
252
+ - Augmentin
253
+ - Butorphanol/Dolorex
254
+ - Prazosin
255
+ - Traumeel/T-Relief
256
+ - Deracoxib
257
+ - Triamcinolone
258
+ - Probiotic
259
+ - Hydrocodone
260
+ - Lactulose
261
+ - Methocarbamol
262
+ - Cranberry Pills
263
+ - Eye Meds
264
+ - Levothyroxine
265
+ - Calcitriol
266
+ - TMS trimethoprim sulfamethoxazole
267
+ - Allergy Antigen Injections
268
+ - Propranolol
269
+ - Flexadin
270
+ - Interceptor
271
+ - Thorn SAT
272
+ - Megaflora
273
+ - Pregabalin
274
+ - Canalevia
275
+ - Cefpodoxime
276
+ - Melatonin
277
+ - Phenylephrine
278
+ - Amoxicillin
279
+ - Arnica/T-Relief
280
+ - Aminocaproic Acid
281
+ - Fluconazole
282
+ - Gastrafate
283
+ - Silver Sulfadiazine
284
+ - Mupirocin
285
+ - Marbofloxacin/Zeniquin
286
+ - Psyllium Husk
287
+ - Chlorpheniramine
288
+ - Tagamet
289
+ - Multi-vitamin
290
+ - D-mannose/cranberry
291
+ - Darbepoetin
292
+ - Soloxine
293
+ - Thuja Occidentalis
294
+ - Pantoprazole
295
+ - Normosol-R
296
+ - Nitrofurantoin
297
+ - Sildenafil
298
+ - Hydromorphone
299
+ - Terbinafine
300
+ - Sucralfate
301
+ - Clopidogrel
302
+ - EndoBlend
303
+ - Omega Benefts
304
+ - Dexmedetomidine
305
+ - Levetiracetam
306
+ - Diethylstilbesterol
307
+ - Nattokinase
308
+ - D3 supplement
309
+ - Modified Chai Hu Jia Long Gu Mu Li Tang supplement
310
+ - Power mushrooms
311
+ - super greens supplement
312
+ - Sertraline/Zoloft
313
+ - Mushroom Supplement
314
+ - Simparica Trio/Sarolaner, moxidectin, and pyrantel
315
+ - 5DMM
316
+ - Joint Supplements
317
+ - Vetericyn
318
+ - Milk Thistle
319
+ - S-Adenosyl methionine
320
+ - Cimetidine
321
+ - Silver Entro Dex
322
+ - Desmopressin
323
+ - Alpha lipoic acid
324
+ - Unasyn
325
+ - Panacur/Fenbendazole
326
+ - Xiao Chai Hu Tang
327
+ - Incurin
328
+ - Dextrose
329
+ - Fresh Frozen Plasma
330
+ - Pamidronate Infusion
331
+ - Curcumin
332
+ - Diazoxide
333
+ - Clavacillin
334
+ - Tetracycline
335
+ - B9 Folic Acid
336
+ - Prednisolone
337
+ - Cyclosporine
338
+ - Ketaconazole
339
+ - Novolin-N
340
+ - Zonisamide
341
+ - Gentamicin/Phenylephrine nasal drops
342
+ - Stool Softener
343
+ - Amitriptyline
344
+ - Moxifloxacin
345
+ - Gemfibrozil
346
+ - Taurine
347
+ - Mometamax
348
+ - Heartgard
349
+ - Green Lipped Mussel Powder
350
+ - Chlorella Powder
351
+ - BioSponge
352
+ - Folate
353
+ - Cobalamine
354
+ - Diazepam
355
+ - GenOne
356
+ - Phenoxybenzamine
357
+ - Flumethrin and Imidacloprid/Seresto
358
+ - Forte Ion Gut Health
359
+ - Dispel Stasis
360
+ - Blood Remaker + Immune Support with Mushrooms
361
+ - Pet Tab
362
+ - Omega 3 Supplement
363
+ - PIQRAY/Alpelisib
364
+ - Vitamin K
365
+ - Quadriplex
366
+ - Colchicine
367
+ - Thyro-Tabs
368
+ - Alprazolam/Xanax
369
+ - Spironolactone
370
+ - Vetstarch
371
+ - Enoxaparin
372
+ - Diclofenac/Voltaren
373
+ - Routin
374
+ - Doxepin
375
+ - Erythromycin
376
+ - Keterolac
377
+ - Tromethamine
378
+ - Cyclosporine/Atopica
379
+ - Pantoea agglomerans
380
+ - Oxybutynin
381
+ - Amikacin
382
+ - Levemir
383
+ - Apocaps
384
+ - Life Gold
385
+ - Red Clover Blossoms powder
386
+ - Modified Citrus Pectin
387
+ - Epinephrine
388
+ - Vitamin C
389
+ - Azathioprine
390
+ - RBC Transfusion
391
+ - Bactrim/sulfamethoxazole & trimethoprim
392
+ - Pet ReLeaf
393
+ - NeoPolyBac Ophthalmic
394
+ - Antibiotics (unspecified)
395
+ - Azithromycin
396
+ - Alendronate
397
+ - Cafazolin
398
+ - Diltiazem
399
+ - Mexiletine
400
+ - Pure IP6
401
+ - VetInsulin
402
+ - Herbal Supplements
403
+ - San Qi Formula
404
+ - Amnivast
405
+ - Crananidin
406
+ - Movoflex
407
+ - Lidocaine
408
+ - Tamsulosin/Flowmax
409
+ - Bedinvetmab/Librela
410
+ - Calcium Carbonate/Tums
411
+ - Dermatrophin
412
+ - Temozolomide
413
+ - Midazolam
414
+ - Anipryl
415
+ - Theophylline
416
+ - Sodium Bicarbonate
417
+ - RenaKare
418
+ - Hydroxazine
419
+ - Zincard/Dexrazoxane
420
+ - Animax
421
+ - Pro-Pectalin
422
+ - Ellevet CHews
423
+ - Cordyceps
424
+ - Benadryl
425
+ - Albon
426
+ - Robenacoxib (Onsior)
427
+ - Lysine
428
+ - Myos muscle building supplement
429
+ - Iron injections
430
+ - Xyzal (L-Cefirizine)
431
+ - Clavacillin
432
+ - Loperamide
433
+ - Theracurmin
434
+ - Quercetin Phytosome
435
+ - Anti-Neoplasia
436
+ - Fiber Supplement
437
+ - Zinc Supplement
438
+ - surgery
439
+ - Whether surgical resection of the tumor was performed.
440
+ - For example:
441
+ <surgery>
442
+ <reasoning>[REASONING]</reasoning>
443
+ <resection>Yes</resection>
444
+ </surgery>
445
+ - surgery_outcome
446
+ - The outcome of the surgery.
447
+ - For example:
448
+ <surgery_outcome>
449
+ <reasoning>[REASONING]</reasoning>
450
+ <outcome>Complete Resection</outcome>
451
+ </surgery_outcome>
452
+ - The following are the possible outcomes of the surgery:
453
+ - Completely Excised
454
+ - Incompletely Excised
455
+ - Unknown
456
+ - metastasis_at_time_of_diagnosis
457
+ - Whether the cancer has spread to other parts of the body.
458
+ - For example:
459
+ <metastasis_at_time_of_diagnosis>
460
+ <metastasis>Yes</metastasis>
461
+ </metastasis_at_time_of_diagnosis>
462
+ - The following are the possible outcomes of the surgery:
463
+ - Yes
464
+ - No
465
+ - Unknown
466
+
467
+ - compounding_pharmacy
468
+ - If a compounding pharmacy is listed, extract the name of the pharmacy. Do not include "fidocure" as a pharmacy name.
469
+ - For example:
470
+ <compounding_pharmacy>
471
+ <reasoning>[REASONING]</reasoning>
472
+ <pharmacy>CVS Pharmacy</pharmacy>
473
+ </compounding_pharmacy>
474
+
475
+ - adverse_effects
476
+ - Any adverse effects the patient has experienced from the medications. For each adverse effect, extract the following fields:
477
+ - The name of the medication
478
+ - The dosage of the medication
479
+ - The date the adverse effect started
480
+ - A description of the adverse effect
481
+ - For example:
482
+ <adverse_effects>
483
+ <reasoning>[REASONING]</reasoning>
484
+ <medication>Doxorubicin/Adriamycin</medication>
485
+ <dosage>20 mg/kg</dosage>
486
+ <date>2021-01-01</date>
487
+ <description>Nausea</description>
488
+ </adverse_effects>
489
+
490
+ - date_of_death
491
+ - The date of death of the patient, if it is known.
492
+ - For example:
493
+ <date_of_death>
494
+ <reasoning>[REASONING]</reasoning>
495
+ <date>2021-01-01</date>
496
+ </date_of_death>
497
+
498
+ - weight
499
+ - The weight of the patient, if it is known. Convert all weights to kilograms.
500
+ - For example:
501
+ <weight>
502
+ <reasoning>[REASONING]</reasoning>
503
+ <weight>20 kg</weight>
504
+ </weight>
505
+ """
requirements.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ gradio
2
+ openai
3
+ pandas
run_extraction.py ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import glob
2
+ import json
3
+ import os
4
+ import time
5
+ import gradio as gr
6
+ from openai import OpenAI
7
+
8
+ import xml.etree.ElementTree as ET
9
+ import re
10
+ import pandas as pd
11
+ import api_keys
12
+
13
+ import note_extraction.hf_hosting.prompts as prompts
14
+
15
+ client = OpenAI(api_key=api_keys.OPENAI_API_KEY)
16
+
17
+ model_name = "gpt-4o-2024-08-06"
18
+
19
+ demo = client.beta.assistants.create(
20
+ name="Information Extractor",
21
+ instructions="Extract information from this note.",
22
+ model=model_name,
23
+ tools=[{"type": "file_search"}],
24
+ )
25
+
26
+ def parse_xml_response(xml_string: str) -> pd.DataFrame:
27
+ """
28
+ Parse the XML response from the model and extract all fields into a dictionary,
29
+ then convert it to a pandas DataFrame with a nested index.
30
+ """
31
+ # Extract only the XML content between the first and last tags
32
+ xml_content = re.search(r'<.*?>.*</.*?>', xml_string, re.DOTALL)
33
+ if xml_content:
34
+ xml_string = xml_content.group(0)
35
+ else:
36
+ print("No valid XML content found.")
37
+ return pd.DataFrame()
38
+
39
+ try:
40
+ root = ET.fromstring(xml_string)
41
+ except ET.ParseError as e:
42
+ print(f"Error parsing XML: {e}")
43
+ return pd.DataFrame()
44
+
45
+ result = {}
46
+
47
+ for element in root:
48
+ tag = element.tag
49
+ if tag in ['patient_name', 'date_of_birth', 'sex', 'weight', 'date_of_death']:
50
+ result[tag] = {
51
+ 'reasoning': element.find('reasoning').text.strip() if element.find('reasoning') is not None else None,
52
+ **{child.tag: child.text.strip() if child.text else None
53
+ for child in element if child.tag != 'reasoning'}
54
+ }
55
+ elif tag in ['traditional_chemo', 'other_cancer_treatments', 'other_conmeds']:
56
+ if tag not in result:
57
+ result[tag] = []
58
+ reasoning = element.find('reasoning')
59
+ for item in element:
60
+ if item.tag in ['drug', 'treatment', 'medication']:
61
+ date_element = element.find('date')
62
+ result[tag].append({
63
+ 'reasoning': reasoning.text.strip() if reasoning is not None else None,
64
+ 'name': item.text.strip() if item.text else None,
65
+ 'date': date_element.text.strip() if date_element is not None and date_element.text else None
66
+ })
67
+ elif tag in ['surgery', 'surgery_outcome', 'metastasis_at_time_of_diagnosis']:
68
+ result[tag] = {
69
+ 'reasoning': element.find('reasoning').text.strip() if element.find('reasoning') is not None else None,
70
+ **{child.tag: child.text.strip() if child.text else None
71
+ for child in element if child.tag != 'reasoning'}
72
+ }
73
+ elif tag == 'compounding_pharmacy':
74
+ result[tag] = {
75
+ 'reasoning': element.find('reasoning').text.strip() if element.find('reasoning') is not None else None,
76
+ 'pharmacy': element.find('pharmacy').text.strip() if element.find('pharmacy') is not None else None
77
+ }
78
+ elif tag == 'adverse_effects':
79
+ if tag not in result:
80
+ result[tag] = []
81
+ effect = {
82
+ 'reasoning': element.find('reasoning').text.strip() if element.find('reasoning') is not None else None
83
+ }
84
+ for child in element:
85
+ if child.tag != 'reasoning':
86
+ effect[child.tag] = child.text.strip() if child.text else None
87
+ if effect:
88
+ result[tag].append(effect)
89
+
90
+ # Convert to nested DataFrame
91
+ df_data = {}
92
+ for key, value in result.items():
93
+ if isinstance(value, dict):
94
+ for sub_key, sub_value in value.items():
95
+ df_data[(key, '1', sub_key)] = [sub_value]
96
+ elif isinstance(value, list):
97
+ for i, item in enumerate(value):
98
+ for sub_key, sub_value in item.items():
99
+ df_data[(key, f"{i+1}", sub_key)] = [sub_value]
100
+ else:
101
+ df_data[(key, '1', '')] = [value]
102
+
103
+ # Create multi-index DataFrame
104
+ df = pd.DataFrame(df_data)
105
+ df.columns = pd.MultiIndex.from_tuples(df.columns)
106
+
107
+ return df
108
+
109
+ def get_response(prompt, file_id, assistant_id):
110
+ thread = client.beta.threads.create(
111
+ messages=[
112
+ {
113
+ "role": "user",
114
+ "content": prompts.info_prompt,
115
+ "attachments": [
116
+ {"file_id": file_id, "tools": [{"type": "file_search"}]}
117
+ ],
118
+ }
119
+ ]
120
+ )
121
+ run = client.beta.threads.runs.create_and_poll(
122
+ thread_id=thread.id, assistant_id=assistant_id
123
+ )
124
+ messages = list(
125
+ client.beta.threads.messages.list(thread_id=thread.id, run_id=run.id)
126
+ )
127
+ assert len(messages) == 1
128
+ message_content = messages[0].content[0].text
129
+ annotations = message_content.annotations
130
+ for index, annotation in enumerate(annotations):
131
+ message_content.value = message_content.value.replace(annotation.text, f"")
132
+ return message_content.value
133
+
134
+ def process(file_content):
135
+ if not os.path.exists("cache"):
136
+ os.makedirs("cache")
137
+ file_name = f"cache/{time.time()}.pdf"
138
+ with open(file_name, "wb") as f:
139
+ f.write(file_content)
140
+
141
+ message_file = client.files.create(file=open(file_name, "rb"), purpose="assistants")
142
+
143
+ response = get_response(prompts.info_prompt, message_file.id, demo.id)
144
+ df = parse_xml_response(response)
145
+
146
+ if df.empty:
147
+ return "<p>No valid information could be extracted from the provided file.</p>"
148
+
149
+ # Transpose the DataFrame
150
+ df_transposed = df.T.reset_index()
151
+ df_transposed.columns = ['Category', 'Index', 'Field', 'Value']
152
+ df_transposed = df_transposed.sort_values(['Category', 'Index', 'Field'])
153
+
154
+ # Convert to HTML with some basic styling
155
+ html = df_transposed.to_html(index=False, classes='table table-striped table-bordered', escape=False)
156
+
157
+ # Add some custom CSS for better readability
158
+ html = f"""
159
+ <style>
160
+ .table {{
161
+ width: 100%;
162
+ max-width: 100%;
163
+ margin-bottom: 1rem;
164
+ background-color: transparent;
165
+ }}
166
+ .table td, .table th {{
167
+ padding: .75rem;
168
+ vertical-align: top;
169
+ border-top: 1px solid #dee2e6;
170
+ }}
171
+ .table thead th {{
172
+ vertical-align: bottom;
173
+ border-bottom: 2px solid #dee2e6;
174
+ }}
175
+ .table tbody + tbody {{
176
+ border-top: 2px solid #dee2e6;
177
+ }}
178
+ .table-striped tbody tr:nth-of-type(odd) {{
179
+ background-color: rgba(0,0,0,.05);
180
+ }}
181
+ </style>
182
+ {html}
183
+ """
184
+
185
+ return html
186
+
187
+ def gradio_interface():
188
+ upload_component = gr.File(label="Upload PDF", type="binary")
189
+ output_component = gr.HTML(label="Extracted Information")
190
+
191
+ demo = gr.Interface(
192
+ fn=process,
193
+ inputs=upload_component,
194
+ outputs=output_component,
195
+ title="Clinical Note Information Extractor",
196
+ description="This tool extracts key information from clinical notes in PDF format.",
197
+ )
198
+ demo.queue()
199
+ demo.launch()
200
+
201
+ if __name__ == "__main__":
202
+ gradio_interface()