Benjamin Consolvo commited on
Commit
7ed8641
·
1 Parent(s): efb324b

first app commit

Browse files
Files changed (8) hide show
  1. Dockerfile +0 -21
  2. OAI_CONFIG_LIST.json +14 -0
  3. README.md +146 -13
  4. app.py +551 -0
  5. intelpreventativehealthcare.py +649 -0
  6. pyproject.toml +20 -0
  7. requirements.txt +10 -1
  8. src/streamlit_app.py +0 -40
Dockerfile DELETED
@@ -1,21 +0,0 @@
1
- FROM python:3.9-slim
2
-
3
- WORKDIR /app
4
-
5
- RUN apt-get update && apt-get install -y \
6
- build-essential \
7
- curl \
8
- software-properties-common \
9
- git \
10
- && rm -rf /var/lib/apt/lists/*
11
-
12
- COPY requirements.txt ./
13
- COPY src/ ./src/
14
-
15
- RUN pip3 install -r requirements.txt
16
-
17
- EXPOSE 8501
18
-
19
- HEALTHCHECK CMD curl --fail http://localhost:8501/_stcore/health
20
-
21
- ENTRYPOINT ["streamlit", "run", "src/streamlit_app.py", "--server.port=8501", "--server.address=0.0.0.0"]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
OAI_CONFIG_LIST.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "model": "meta-llama/Llama-3.3-70B-Instruct",
4
+ "base_url": "https://api.inference.denvrdata.com/v1/",
5
+ "api_key": "",
6
+ "price": [0.0, 0.0]
7
+ },
8
+ {
9
+ "model": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
10
+ "base_url": "https://api.inference.denvrdata.com/v1/",
11
+ "api_key": "",
12
+ "price": [0.0, 0.0]
13
+ }
14
+ ]
README.md CHANGED
@@ -1,20 +1,153 @@
1
  ---
2
- title: Preventative Healthcare
3
- emoji: 🚀
4
- colorFrom: red
5
- colorTo: red
6
- sdk: docker
7
- app_port: 8501
8
- tags:
9
- - streamlit
10
  pinned: false
11
- short_description: Streamlit template space
12
  license: apache-2.0
 
13
  ---
 
14
 
15
- # Welcome to Streamlit!
16
 
17
- Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
20
- forums](https://discuss.streamlit.io).
 
1
  ---
2
+ title: Preventative Healthcare with AutoGen
3
+ emoji: 🔥
4
+ colorFrom: yellow
5
+ colorTo: purple
6
+ sdk: streamlit
7
+ sdk_version: 1.42.2
8
+ app_file: app.py
 
9
  pinned: false
 
10
  license: apache-2.0
11
+ short_description: Using AI agents for preventative healthcare maintenance
12
  ---
13
+ [//]: <Add samples here https://github.com/microsoft/autogen/tree/main/python/samples>
14
 
15
+ ## AutoGen Multi-Agent Chat Preventative Healthcare
16
 
17
+ This is a multi-agent system built on top of AutoGen agents designed to automate and optimize preventative healthcare outreach. It uses multiple agents, large language models (LLMs) and asynchronous programming to streamline the process of identifying patients who meet specific screening criteria, filter patient data, and generate personalized outreach emails.
18
+
19
+ The system uses model endpoints hosted by [Denvr Dataworks](https://www.denvrdata.com/intel) on Intel® Gaudi® accelerators, and an OpenAI-compatible API key.
20
+
21
+ Credit: Though heavily modified, the original idea comes from Mike Lynch on his [Medium blog](https://medium.com/@micklynch_6905/hospitalgpt-managing-a-patient-population-with-autogen-powered-by-gpt-4-mixtral-8x7b-ef9f54f275f1).
22
+
23
+ ### Workflow:
24
+
25
+ <p align="center">
26
+ <img width="700" src="images/prev_healthcare_4.drawio.svg">
27
+ </p>
28
+
29
+ 1. **Define Screening Criteria**: After getting the general screening task from the user, the User Proxy Agent starts a conversation between the Epidemiologist Agent and the Doctor Critic Agent to define the criteria for patient outreach based on the target screening type. The output criteria is age range (e.g., 40–70), gender, and relevant medical history.
30
+
31
+ 2. **Filter Patients**: The Data Analyst Agent filters patient data from a CSV file based on the defined criteria, including age range, gender, and medical conditions. The patient data are synthetically generated. You can find the sample data under [data/patients.csv](data/patients.csv).
32
+
33
+ 3. **Generate Outreach Emails**: The program generates outreach emails for the filtered patients using LLMs and saves them as text files.
34
+
35
+ ### Setup
36
+
37
+ If you want a local copy of the application to run, you can clone the repository and then navigate into the folder with:
38
+
39
+ ```bash
40
+ git clone https://huggingface.co/spaces/Intel/preventative_healthcare
41
+ cd preventative_healthcare
42
+ ```
43
+
44
+
45
+ You can use the `uv` package to manage your virtual environment and dependencies. Just initialize the `uv` project, and create the virtual environment:
46
+
47
+ ```bash
48
+ uv init
49
+ uv venv
50
+ ```
51
+
52
+ Activate the virtual environment
53
+ ```bash
54
+ source .venv/bin/activate
55
+ ```
56
+
57
+ Install dependencies:
58
+ ```bash
59
+ uv sync
60
+ ```
61
+
62
+ To deactivate the virtual environment when finished running the application:
63
+ ```bash
64
+ deactivate
65
+ ```
66
+
67
+ ### OpenAI API Key, Model Name, and Endpoint URL
68
+
69
+ 1. Add your OpenAI-compatible API key to the `OAI_CONFIG_LIST.json` file.
70
+ 2. Modify the `model` and `base_url` to the model name and endpoint URL that you are using. The `OAI_CONFIG_LIST.json` should look like:
71
+ ```json
72
+ [
73
+ {
74
+ "model": "meta-llama/Llama-3.3-70B-Instruct",
75
+ "base_url": "https://api.inference.denvrdata.com/v1/",
76
+ "api_key": "",
77
+ "price": [0.0, 0.0]
78
+ },
79
+ {
80
+ "model": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
81
+ "base_url": "https://api.inference.denvrdata.com/v1/",
82
+ "api_key": "",
83
+ "price": [0.0, 0.0]
84
+ }
85
+ ]
86
+ ```
87
+
88
+ ### Modifying prompts
89
+ To modify prompts, you can edit them in the UI, or you can edit the following files:
90
+
91
+ 1. User proxy agent: the agent responsible for passing along the user's preventative healthcare task to the other agents.
92
+ [prompts/user_proxy_prompt.py](prompts/user_proxy_prompt.py)
93
+ 2. Epidemiologist agent: The disease specialist agent who will gather the preventative healthcare task and decide on patient criteria.
94
+ [prompts/epidemiologist_prompt.py](prompts/epidemiologist_prompt.py)
95
+ 3. Doctor Critic agent: The doctor critic agent reviews the criteria from the epidemiologist and passes this along. The output will be used to filter actual patients from the patient data.
96
+ [prompts/doctor_critic_prompt.py](prompts/doctor_critic_prompt.py)
97
+ 4. Outreach email: This is not an agent, but still uses an LLM to build the outreach email.
98
+ [prompts/outreach_email_prompt.py](prompts/outreach_email_prompt.py)
99
+
100
+ ### Example Usage
101
+
102
+ If you want to run the app with streamlit, you can run using:
103
+
104
+ ```bash
105
+ streamlit run app.py
106
+ ```
107
+
108
+ If you want to just run the script locally from the command line, use the following command:
109
+
110
+ ```bash
111
+ python intelpreventativehealthcare.py \
112
+ --oai_config "OAI_CONFIG_LIST.json" \
113
+ --target_screening "Type 2 Diabetes" \
114
+ --patients_file "data/patients.csv" \
115
+ --phone "123-456-7890" \
116
+ --email "[email protected]" \
117
+ --name "Benjamin Consolvo"
118
+ ```
119
+
120
+ The arguments are defined as follows:
121
+
122
+ - `--oai_config`: Path to the `OAI_CONFIG_LIST.json` file, which contains the model endpoints, model name, and api key.
123
+ - `--target_screening`: The type of screening task (e.g., "Type 2 Diabetes screening").
124
+ - `--patients_file`: Path to the CSV file containing patient data. Default is `data/patients.csv`.
125
+ - `--phone`: Phone number to include in the outreach emails. Default is `123-456-7890`.
126
+ - `--email`: Reply email address to include in the outreach emails. Default is `[email protected]`.
127
+ - `--name`: Name to include in the outreach emails. Default is `Benjamin Consolvo`.
128
+
129
+ This will process the patient data, filter based on the specified criteria, and generate outreach emails for the patients. The emails will be saved as text files in the `data/` directory.
130
+
131
+ ### 6 Lessons Learned
132
+
133
+ 1. Some LLMs perform better than others at certain tasks. While this may seem obvious, in practice, you often need to adjust which LLMs you use after seeing the results. In my case, I found that [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) model was much more consistent and hallucinated less than [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) for email generation.
134
+ 2. Setting temperature to 0 is important for getting a consistent output response from LLMs. In my use-case, I ended up setting this creativity level to 0 across all models.
135
+ 3. Prompt engineering is very important in the age of instructing LLMs on what to do. My top 3 tips:
136
+ - Be specific and detailed
137
+ - Give exact output format examples
138
+ - Tell the LLM what to do, rather than telling it everything it should not do
139
+
140
+ You can read more about prompt engineering on [OpenAI's blog here](https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-the-openai-api)
141
+
142
+ 4. Certain tasks are easier to manage with traditional programming rather than building an agent to do it. In the case of getting data consistently from a database with a specified format, write a function rather than building an agent. The LLM may hallucinate and not carry out the task correctly. I implemented this after fighting with the agents in a function called `get_patients_from_criteria`. When I started this project, the LLMs were inventing data that were not a part of the database, even though I clearly instructed the agent to only use data from the database! To resolve this, I made sure that the agent was using a specific function to read from the database with a tool-call.
143
+ 5. Do operations asynchronously wherever possible. Instead of writing emails one by one in a for loop, write them all at once with `async`.
144
+ 6. Code writing tools like GitHub Copilot, Cursor, and Windsurf can save a lot of time, but you still need to pay attention to the output and understand what is going on with the code. A lot of unecessary lines of code and technical debt will be accumulated by relying purely on code generation tools.
145
+
146
+ ### Follow Up
147
+
148
+ Get your own OpenAI-compatible API key and connect your agents to LLMs on Intel® Gaudi® accelerators with just an endpoint, courtesy of cloud-provider Denvr Dataworks: https://www.denvrdata.com/intel
149
+
150
+ Chat with 6K+ fellow developers on the Intel DevHub Discord: https://discord.gg/kfJ3NKEw5t
151
+
152
+ Connect with me on LinkedIn: https://linkedin.com/in/bconsolvo
153
 
 
 
app.py ADDED
@@ -0,0 +1,551 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import pandas as pd
3
+ import asyncio
4
+ import io
5
+ import contextlib
6
+ import os
7
+ from pathlib import Path
8
+ from intelpreventativehealthcare import (
9
+ target_patients_outreach,
10
+ find_patients,
11
+ write_outreach_emails,
12
+ get_configs,
13
+ )
14
+ # Import the prompt templates
15
+ from intelpreventativehealthcare import (
16
+ USER_PROXY_PROMPT,
17
+ EPIDEMIOLOGIST_PROMPT,
18
+ DOCTOR_CRITIC_PROMPT,
19
+ OUTREACH_EMAIL_PROMPT_TEMPLATE,
20
+ )
21
+ from openai import OpenAI
22
+ import streamlit.components.v1 as components # Add this import for custom HTML
23
+
24
+ # Streamlit app configuration
25
+ st.set_page_config(page_title="Preventative Healthcare Outreach", layout="wide")
26
+
27
+ # Title at the top of the app
28
+ st.title("Preventative Healthcare Outreach")
29
+ st.markdown("""
30
+ Visit the README page below to learn how the agentic system works. The system uses AI agents to generate outreach criteria, filter patients, and ultimately write outreach emails. To get the agents working, you can follow these steps:
31
+ 1. Optionally, customize the prompts of the agents, or just use the default ones to get started.
32
+ 2. Select default patient data, or upload your own CSV file.
33
+ 3. Describe a medical screening task.
34
+ 4. Click on "Generate Outreach Emails" to create draft emails to patients (.txt files with email drafts).
35
+ """)
36
+
37
+ # Function to read README.md file
38
+ def read_readme():
39
+ readme_path = Path(__file__).parent / "README.md"
40
+
41
+ if readme_path.exists():
42
+ with open(readme_path, 'r') as f:
43
+ readme_content = f.read()
44
+ return readme_content
45
+ else:
46
+ return "README.md file not found in the project directory."
47
+
48
+ # Function to embed SVG images directly into the markdown content
49
+ def fix_svg_images_in_markdown(markdown_content):
50
+ import re
51
+
52
+ # Find SVG image tags in the markdown content
53
+ svg_pattern = r'<img[^>]*src="([^"]*\.svg)"[^>]*>'
54
+
55
+ def replace_with_embedded_svg(match):
56
+ img_tag = match.group(0)
57
+ src_match = re.search(r'src="([^"]*)"', img_tag)
58
+ if not src_match:
59
+ return img_tag
60
+
61
+ src_path = src_match.group(1)
62
+ width_match = re.search(r'width="([^"]*)"', img_tag)
63
+ width = width_match.group(1) if width_match else "100%"
64
+
65
+ # Construct full path to the image
66
+ img_path = Path(__file__).parent / src_path
67
+
68
+ if img_path.exists():
69
+ try:
70
+ # Read SVG content directly
71
+ with open(img_path, 'r') as f:
72
+ svg_content = f.read()
73
+
74
+ # Create a custom HTML component for the SVG with proper styling
75
+ return f"""<div style="text-align:center; margin:20px 0;">
76
+ <div style="max-width:{width}px; margin:0 auto;">
77
+ {svg_content}
78
+ </div>
79
+ </div>"""
80
+ except Exception as e:
81
+ return f"""<div style="text-align:center; color:red; padding:10px;">
82
+ Error loading SVG image: {e}
83
+ </div>"""
84
+ else:
85
+ return f"""<div style="text-align:center; color:red; padding:10px;">
86
+ Image not found: {src_path}
87
+ </div>"""
88
+
89
+ # Replace all SVG image tags with embedded SVG content
90
+ return re.sub(svg_pattern, replace_with_embedded_svg, markdown_content)
91
+
92
+ # Create tabs
93
+ tab1, tab2 = st.tabs(["Healthcare Outreach App", "README"])
94
+
95
+ # Initialize session state for prompts if not already present
96
+ if 'user_proxy_prompt' not in st.session_state:
97
+ st.session_state.user_proxy_prompt = USER_PROXY_PROMPT
98
+ if 'epidemiologist_prompt' not in st.session_state:
99
+ st.session_state.epidemiologist_prompt = EPIDEMIOLOGIST_PROMPT
100
+ if 'doctor_critic_prompt' not in st.session_state:
101
+ st.session_state.doctor_critic_prompt = DOCTOR_CRITIC_PROMPT
102
+ if 'outreach_email_prompt' not in st.session_state:
103
+ st.session_state.outreach_email_prompt = OUTREACH_EMAIL_PROMPT_TEMPLATE
104
+
105
+ # Main Healthcare App Tab (Tab 1)
106
+ with tab1:
107
+ # --- Activity/log screen for agent communication ---
108
+ st.markdown("### Activity Log")
109
+ # Create a container with fixed height and scrollbar for logs
110
+ log_container = st.container()
111
+ with log_container:
112
+ # Use an expander that's open by default to contain the log
113
+ with st.expander("Real-time Log", expanded=True):
114
+ log_placeholder = st.empty()
115
+
116
+ # --- Move user inputs, instructions, and CSV column info to sidebar ---
117
+ with st.sidebar:
118
+ # Add a section for customizing prompts at the top of the sidebar
119
+ st.markdown("### Customize Agent Prompts")
120
+ st.caption("The agents use LLMs and natural language understanding (NLU) to organize the tasks they need to accomplish. You can modify the prompts for each agent below; these prompts are given to the agents so that they can work together to produce the final outreach emails for the preventative healthcare task at hand.")
121
+
122
+ # User Proxy Prompt
123
+ with st.expander("User Proxy Prompt"):
124
+ user_prompt = st.text_area(
125
+ "User Proxy Prompt",
126
+ value=st.session_state.user_proxy_prompt,
127
+ height=300,
128
+ key="user_proxy_input",
129
+ label_visibility="hidden",
130
+ # Add these style properties to preserve whitespace formatting
131
+ help="",
132
+ placeholder="",
133
+ disabled=False,
134
+ # Use CSS to preserve whitespace formatting
135
+ max_chars=None
136
+ )
137
+ st.session_state.user_proxy_prompt = user_prompt
138
+
139
+ # Epidemiologist Prompt
140
+ with st.expander("Epidemiologist Prompt"):
141
+ epi_prompt = st.text_area(
142
+ "Epidemiologist Prompt",
143
+ value=st.session_state.epidemiologist_prompt,
144
+ height=300,
145
+ key="epidemiologist_input",
146
+ label_visibility="hidden",
147
+ help="",
148
+ placeholder="",
149
+ disabled=False,
150
+ max_chars=None
151
+ )
152
+ st.session_state.epidemiologist_prompt = epi_prompt
153
+
154
+ # Doctor Critic Prompt
155
+ with st.expander("Doctor Critic Prompt"):
156
+ doc_prompt = st.text_area(
157
+ "Doctor Critic Prompt",
158
+ value=st.session_state.doctor_critic_prompt,
159
+ height=300,
160
+ key="doctor_critic_input",
161
+ label_visibility="hidden",
162
+ help="",
163
+ placeholder="",
164
+ disabled=False,
165
+ max_chars=None
166
+ )
167
+ st.session_state.doctor_critic_prompt = doc_prompt
168
+
169
+ # Outreach Email Prompt Template
170
+ with st.expander("Email Template Prompt"):
171
+ email_prompt = st.text_area(
172
+ "Email Template Prompt",
173
+ value=st.session_state.outreach_email_prompt,
174
+ height=300,
175
+ key="email_template_input",
176
+ label_visibility="hidden",
177
+ help="",
178
+ placeholder="",
179
+ disabled=False,
180
+ max_chars=None
181
+ )
182
+ st.session_state.outreach_email_prompt = email_prompt
183
+
184
+ # Add custom CSS to preserve whitespace in text areas while ensuring content fits
185
+ st.markdown("""
186
+ <style>
187
+ .stTextArea textarea {
188
+ font-family: monospace;
189
+ white-space: pre-wrap !important; /* Use pre-wrap to preserve whitespace but allow wrapping */
190
+ word-wrap: break-word !important; /* Ensure words break to next line if needed */
191
+ line-height: 1.4;
192
+ tab-size: 2; /* Reduce tab size to save space */
193
+ padding: 8px;
194
+ font-size: 0.9em; /* Slightly smaller font to fit more content */
195
+ }
196
+ </style>
197
+ """, unsafe_allow_html=True)
198
+
199
+ # Reset prompts button
200
+ if st.button("Reset Prompts to Default"):
201
+ st.session_state.user_proxy_prompt = USER_PROXY_PROMPT
202
+ st.session_state.epidemiologist_prompt = EPIDEMIOLOGIST_PROMPT
203
+ st.session_state.doctor_critic_prompt = DOCTOR_CRITIC_PROMPT
204
+ st.session_state.outreach_email_prompt = OUTREACH_EMAIL_PROMPT_TEMPLATE
205
+ st.rerun()
206
+
207
+ st.markdown("---")
208
+
209
+ # Now add the "Get started" section after the prompts
210
+ st.header("Patient Data and Screening Task")
211
+
212
+ st.caption("Required CSV columns: patient_id, First Name, Last Name, Email, Patient diagnosis summary, age, gender, condition")
213
+
214
+ # Create a container for the default dataset option to control its appearance
215
+ default_dataset_container = st.container()
216
+
217
+ # Add the file upload option after the default dataset option
218
+ uploaded_file = st.file_uploader("Upload your own CSV file with patient data", type=["csv"])
219
+
220
+ # If a file is uploaded, show a message and disable the default checkbox
221
+ if uploaded_file is not None:
222
+ # Visual indication that custom data is being used
223
+ st.success("✅ Using your uploaded file")
224
+
225
+ # Disable the default dataset option with clear visual feedback
226
+ with default_dataset_container:
227
+ st.markdown("""
228
+ <div style="opacity: 0.5; pointer-events: none;">
229
+ <input type="checkbox" disabled> Use default dataset (data/patients.csv)
230
+ <div style="font-size: 0.8em; color: #999; font-style: italic;">
231
+ Disabled because custom file is uploaded
232
+ </div>
233
+ </div>
234
+ """, unsafe_allow_html=True)
235
+
236
+ # Set use_default to False when a file is uploaded
237
+ use_default = False
238
+ else:
239
+ # No file uploaded, show normal checkbox
240
+ with default_dataset_container:
241
+ use_default = st.checkbox("Use default dataset (data/patients.csv)", value=True)
242
+
243
+ screening_task = st.text_input("Enter the medical screening task (e.g., 'Colonoscopy screening')", "")
244
+
245
+ # Add contact information section
246
+ st.markdown("---")
247
+ st.subheader("Healthcare Provider Contact Information")
248
+ st.caption("This information will appear in the emails sent to patients")
249
+
250
+ # Create three columns for contact info fields
251
+ col1, col2, col3 = st.columns(3)
252
+
253
+ with col1:
254
+ provider_name = st.text_input("Provider Name", "Benjamin Consolvo")
255
+
256
+ with col2:
257
+ provider_email = st.text_input("Provider Email", "[email protected]")
258
+
259
+ with col3:
260
+ provider_phone = st.text_input("Provider Phone", "123-456-7890")
261
+
262
+ # Validate input fields before enabling the button
263
+ required_fields_empty = (
264
+ screening_task.strip() == "" or
265
+ provider_name.strip() == "" or
266
+ provider_email.strip() == "" or
267
+ provider_phone.strip() == ""
268
+ )
269
+
270
+ if required_fields_empty:
271
+ st.warning("Please fill in all required fields before proceeding.")
272
+ st.markdown("---")
273
+ # Move the button to the sidebar - disabled if required fields are empty
274
+ generate = st.button("Generate Outreach Emails", disabled=required_fields_empty)
275
+
276
+ # Explicitly set environment variable to avoid TTY errors
277
+ os.environ["PYTHONUNBUFFERED"] = "1"
278
+
279
+ # Only run the generation logic if we're on the first tab
280
+ if tab1._active and generate:
281
+ # Since the button can only be clicked when all fields are filled,
282
+ # we don't need additional validation here
283
+
284
+ # Hugging Face secrets
285
+ api_key = st.secrets["OPENAI_API_KEY"]
286
+ base_url = st.secrets["OPENAI_BASE_URL"]
287
+
288
+ # --- Initialize log ---
289
+ log_messages = []
290
+ def log(msg):
291
+ log_messages.append(msg)
292
+ # Show all messages in the scrollable container with better contrast
293
+ log_placeholder.markdown(
294
+ f"""
295
+ <div style="height: 400px; overflow-y: auto; border: 1px solid #cccccc;
296
+ padding: 15px; border-radius: 5px; background-color: rgba(240, 242, 246, 0.4);
297
+ color: inherit; font-family: monospace;">
298
+ {"<br>".join(log_messages)}
299
+ </div>
300
+ """,
301
+ unsafe_allow_html=True
302
+ )
303
+
304
+ # Capture stdout/stderr during the workflow
305
+ stdout_buffer = io.StringIO()
306
+ stderr_buffer = io.StringIO()
307
+ with contextlib.redirect_stdout(stdout_buffer), contextlib.redirect_stderr(stderr_buffer):
308
+ if not screening_task:
309
+ st.error("Please enter a medical screening task.")
310
+ elif not uploaded_file and not use_default:
311
+ st.error("Please upload a CSV file or select the default dataset.")
312
+ else:
313
+ # Load patient data
314
+ if uploaded_file:
315
+ patients_file = uploaded_file
316
+ else:
317
+ # Use absolute path for default dataset
318
+ patients_file = os.path.join(os.path.dirname(__file__), "data/patients.csv")
319
+
320
+ try:
321
+ patients_df = pd.read_csv(patients_file)
322
+ except Exception as e:
323
+ st.error(f"Error reading the CSV file: {e}")
324
+ st.stop()
325
+
326
+ # Validate required columns
327
+ required_columns = [
328
+ 'patient_id', 'First Name', 'Last Name', 'Email',
329
+ 'Patient diagnosis summary', 'age', 'gender', 'condition'
330
+ ]
331
+ if not all(col in patients_df.columns for col in required_columns):
332
+ st.error(f"The uploaded CSV file is missing required columns: {required_columns}")
333
+ st.stop()
334
+
335
+ # Load configurations
336
+ llama_filter_dict = {"model": ["meta-llama/Llama-3.3-70B-Instruct"]}
337
+ deepseek_filter_dict = {"model": ["deepseek-ai/DeepSeek-R1-Distill-Llama-70B"]}
338
+ config_list_llama = get_configs("OAI_CONFIG_LIST.json", llama_filter_dict)
339
+ config_list_deepseek = get_configs("OAI_CONFIG_LIST.json", deepseek_filter_dict)
340
+
341
+ # Ensure the API key from secrets is used
342
+ for config in config_list_llama:
343
+ config["api_key"] = api_key
344
+ for config in config_list_deepseek:
345
+ config["api_key"] = api_key
346
+
347
+ # --- Log agent communication ---
348
+ log("🟢 <b>Starting agent workflow...</b>")
349
+ log("🧑‍⚕️ <b>Screening task:</b> " + screening_task)
350
+ log("📄 <b>Loaded patient data:</b> {} records".format(len(patients_df)))
351
+
352
+ # Generate criteria for outreach - Pass the custom prompts
353
+ log("🤖 <b>Agent (Llama):</b> Generating outreach criteria...")
354
+ criteria = asyncio.run(target_patients_outreach(
355
+ screening_task, config_list_llama, config_list_deepseek,
356
+ log_fn=log if "log_fn" in target_patients_outreach.__code__.co_varnames else None,
357
+ user_proxy_prompt=st.session_state.user_proxy_prompt,
358
+ epidemiologist_prompt=st.session_state.epidemiologist_prompt,
359
+ doctor_critic_prompt=st.session_state.doctor_critic_prompt
360
+ ))
361
+ log("✅ <b>Criteria generated.</b>")
362
+
363
+ # Find patients matching criteria
364
+ log("🤖 <b>Agent (Llama):</b> Filtering patients based on criteria...")
365
+ filtered_patients, arguments_criteria = asyncio.run(find_patients(
366
+ criteria, config_list_llama,
367
+ log_fn=log if "log_fn" in find_patients.__code__.co_varnames else None,
368
+ patients_file_path=patients_file # Use correct parameter name: patients_file_path
369
+ ))
370
+ log("✅ <b>Patients filtered.</b>")
371
+
372
+ if filtered_patients.empty:
373
+ log("⚠️ <b>No patients matched the criteria.</b>")
374
+ st.warning("No patients matched the criteria.")
375
+ else:
376
+ # Initialize OpenAI client
377
+ openai_client = OpenAI(api_key=api_key, base_url=base_url)
378
+
379
+ # Generate outreach emails - Pass the custom email template
380
+ log("🤖 <b>Agent (Llama):</b> Generating outreach emails...")
381
+ asyncio.run(write_outreach_emails(
382
+ filtered_patients,
383
+ screening_task,
384
+ arguments_criteria,
385
+ openai_client,
386
+ config_list_llama[0]['model'],
387
+ phone=provider_phone, # Pass the provider's phone from form
388
+ email=provider_email, # Pass the provider's email from form
389
+ name=provider_name, # Pass the provider's name from form
390
+ log_fn=log if "log_fn" in write_outreach_emails.__code__.co_varnames else None,
391
+ outreach_email_prompt_template=st.session_state.outreach_email_prompt
392
+ ))
393
+
394
+ # Make sure data directory exists (for Hugging Face Spaces)
395
+ data_dir = os.path.join(os.path.dirname(__file__), "data")
396
+ os.makedirs(data_dir, exist_ok=True)
397
+
398
+ # Generate expected email filenames based on filtered patients
399
+ expected_email_files = []
400
+ for _, patient in filtered_patients.iterrows():
401
+ # Construct the expected filename based on patient data
402
+ firstname = patient['First Name']
403
+ lastname = patient['Last Name']
404
+ filename = f"{firstname}_{lastname}_email.txt"
405
+ if os.path.exists(os.path.join(data_dir, filename)):
406
+ expected_email_files.append(filename)
407
+
408
+ # Use only the email files for patients in the filtered DataFrame
409
+ email_files = expected_email_files
410
+
411
+ if email_files:
412
+ log("✅ <b>Outreach emails generated successfully:</b> {} emails created".format(len(email_files)))
413
+ st.success(f"{len(email_files)} outreach emails have been generated!")
414
+
415
+ # Create a section for downloads
416
+ st.markdown("### Download Generated Emails")
417
+
418
+ # Store email content in session state to persist across interactions
419
+ if 'email_contents' not in st.session_state:
420
+ st.session_state.email_contents = {}
421
+ for email_file in email_files:
422
+ with open(os.path.join(data_dir, email_file), 'r') as f:
423
+ st.session_state.email_contents[email_file] = f.read()
424
+
425
+ # Create ZIP file only once and store in session state
426
+ if 'zip_buffer' not in st.session_state:
427
+ import zipfile
428
+ zip_buffer = io.BytesIO()
429
+ with zipfile.ZipFile(zip_buffer, 'w', zipfile.ZIP_DEFLATED) as zip_file:
430
+ for email_file, content in st.session_state.email_contents.items():
431
+ zip_file.writestr(email_file, content)
432
+ st.session_state.zip_buffer = zip_buffer.getvalue()
433
+
434
+ # Create base64 encoding of zip file
435
+ import base64
436
+ b64_zip = base64.b64encode(st.session_state.zip_buffer).decode()
437
+
438
+ # Create HTML for ZIP download - Use components.html instead of st.markdown
439
+ zip_html = f"""
440
+ <div style="margin-bottom: 20px;">
441
+ <a href="data:application/zip;base64,{b64_zip}"
442
+ download="patient_emails.zip"
443
+ style="text-decoration: none; display: inline-block; padding: 12px 18px;
444
+ border: 1px solid #ddd; border-radius: 4px; background-color: #4CAF50;
445
+ color: white; font-size: 16px; font-weight: bold; text-align: center;">
446
+ 📦 Download All Emails as ZIP
447
+ </a>
448
+ </div>
449
+ """
450
+
451
+ # Use components.html instead of st.markdown for ZIP download
452
+ components.html(zip_html, height=70)
453
+
454
+ st.markdown("---")
455
+ st.markdown("#### Individual Email Downloads")
456
+
457
+ # Generate HTML for individual email downloads
458
+ individual_html = """
459
+ <div style="display: flex; flex-wrap: wrap; gap: 8px;">
460
+ """
461
+
462
+ # Generate download links for all emails
463
+ for i, email_file in enumerate(email_files):
464
+ file_content = st.session_state.email_contents.get(email_file, "")
465
+ # Create a base64 encoded version of the file content
466
+ b64_content = base64.b64encode(file_content.encode()).decode()
467
+
468
+ # Extract a more complete display name (First + Last name)
469
+ name_parts = email_file.split('_')[:2] # Get first and last name parts
470
+ display_name = " ".join(name_parts) # Join with space to create "First Last"
471
+
472
+ # Add download link to HTML
473
+ individual_html += f"""
474
+ <a href="data:text/plain;base64,{b64_content}"
475
+ download="{email_file}"
476
+ style="text-decoration: none; display: inline-block; margin: 4px; padding: 8px 12px;
477
+ border: 1px solid #ddd; border-radius: 4px; background-color: #f0f2f6;
478
+ color: #262730; font-size: 14px; text-align: center; min-width: 120px;">
479
+ {display_name}
480
+ </a>
481
+ """
482
+
483
+ individual_html += """
484
+ </div>
485
+ """
486
+
487
+ # Use components.html for individual downloads - estimate height based on number of emails
488
+ # Increase height calculation to account for potentially longer names
489
+ components.html(individual_html, height=100 + (len(email_files) // 4) * 60)
490
+
491
+ else:
492
+ log("⚠️ <b>Email generation process completed but no email files were found.</b>")
493
+ st.warning("The email generation process completed but no email files were found in the data directory. This might indicate an issue with the email generation or file saving process.")
494
+
495
+ # After workflow, append captured output
496
+ std_output = stdout_buffer.getvalue()
497
+ std_error = stderr_buffer.getvalue()
498
+
499
+ if std_output:
500
+ log_messages.append("<b>Terminal Output:</b>")
501
+ for line in std_output.splitlines():
502
+ if line.strip(): # Skip empty lines
503
+ log_messages.append(line)
504
+ # Update the log display with all messages using better contrast
505
+ log_placeholder.markdown(
506
+ f"""
507
+ <div style="height: 400px; overflow-y: auto; border: 1px solid #cccccc;
508
+ padding: 15px; border-radius: 5px; background-color: rgba(240, 242, 246, 0.4);
509
+ color: inherit; font-family: monospace;">
510
+ {"<br>".join(log_messages)}
511
+ </div>
512
+ """,
513
+ unsafe_allow_html=True
514
+ )
515
+
516
+ if std_error:
517
+ log_messages.append("<b style='color:#ff6b6b;'>Terminal Error:</b>")
518
+ for line in std_error.splitlines():
519
+ if line.strip(): # Skip empty lines
520
+ log_messages.append(f"<span style='color:#ff6b6b;'>{line}</span>")
521
+ # Update the log display with all messages
522
+ log_placeholder.markdown(
523
+ f"""
524
+ <div style="height: 400px; overflow-y: auto; border: 1px solid #cccccc;
525
+ padding: 15px; border-radius: 5px; background-color: rgba(240, 242, 246, 0.4);
526
+ color: inherit; font-family: monospace;">
527
+ {"<br>".join(log_messages)}
528
+ </div>
529
+ """,
530
+ unsafe_allow_html=True
531
+ )
532
+
533
+ # README Tab (Tab 2)
534
+ with tab2:
535
+ readme_content = read_readme()
536
+
537
+ # Process the README content to properly handle SVG images
538
+ readme_with_embedded_svgs = fix_svg_images_in_markdown(readme_content)
539
+
540
+ # Use unsafe_allow_html=True to render HTML content properly
541
+ st.markdown(readme_with_embedded_svgs, unsafe_allow_html=True)
542
+
543
+ # Add CSS to ensure SVGs are responsive and display properly
544
+ st.markdown("""
545
+ <style>
546
+ svg {
547
+ max-width: 100%;
548
+ height: auto;
549
+ }
550
+ </style>
551
+ """, unsafe_allow_html=True)
intelpreventativehealthcare.py ADDED
@@ -0,0 +1,649 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # The code is a simulation of a healthcare system that uses AI agents to manage patient outreach
2
+ # Author: Benjamin Consolvo
3
+ # Originally created in 2025
4
+ # Original code and idea from Mike Lynch on Medium here. Heavily modified.
5
+ # https://medium.com/@micklynch_6905/hospitalgpt-managing-a-patient-population-with-autogen-powered-by-gpt-4-mixtral-8x7b-ef9f54f275f1
6
+ # https://github.com/micklynch/hospitalgpt
7
+
8
+ import os
9
+ import asyncio
10
+ import pandas as pd
11
+ import json
12
+ import argparse
13
+ from typing import Callable, Dict, Any
14
+ from autogen import (
15
+ AssistantAgent,
16
+ UserProxyAgent,
17
+ config_list_from_json,
18
+ GroupChat,
19
+ GroupChatManager,
20
+ register_function,
21
+ )
22
+ from openai import OpenAI
23
+ from prompts.epidemiologist_prompt import EPIDEMIOLOGIST_PROMPT
24
+ from prompts.doctor_critic_prompt import DOCTOR_CRITIC_PROMPT
25
+ from prompts.user_proxy_prompt import USER_PROXY_PROMPT
26
+ from prompts.outreach_email_prompt import OUTREACH_EMAIL_PROMPT_TEMPLATE
27
+ import aiofiles # For asynchronous file writing
28
+ import functools # For wrapping synchronous functions in async
29
+
30
+ # Export the prompt variables for use in the app
31
+ __all__ = [
32
+ "get_configs", "target_patients_outreach", "find_patients",
33
+ "write_outreach_emails", "USER_PROXY_PROMPT", "EPIDEMIOLOGIST_PROMPT",
34
+ "DOCTOR_CRITIC_PROMPT", "OUTREACH_EMAIL_PROMPT_TEMPLATE"
35
+ ]
36
+
37
+ def get_configs(
38
+ env_or_file: str,
39
+ filter_dict: Dict[str, Any]
40
+ ) -> Dict[str, Any]:
41
+ """
42
+ Load configuration from a JSON file.
43
+
44
+ Args:
45
+ env_or_file (str): Path to the JSON file or environment variable name.
46
+ filter_dict (Dict[str, Any]): Dictionary to filter the configuration file.
47
+
48
+ Returns:
49
+ Dict[str, Any]: Filtered configuration dictionary.
50
+ """
51
+ return config_list_from_json(env_or_file=env_or_file, filter_dict=filter_dict)
52
+
53
+ async def target_patients_outreach(
54
+ target_screening: str,
55
+ config_list_llama: Dict[str, Any],
56
+ config_list_deepseek: Dict[str, Any],
57
+ log_fn=None,
58
+ user_proxy_prompt=USER_PROXY_PROMPT,
59
+ epidemiologist_prompt=EPIDEMIOLOGIST_PROMPT,
60
+ doctor_critic_prompt=DOCTOR_CRITIC_PROMPT
61
+ ) -> str:
62
+ """
63
+ Determines the criteria for patient outreach based on a screening task.
64
+
65
+ This function facilitates a conversation between a user, an epidemiologist,
66
+ and a doctor critic to define the criteria for patient outreach. The output
67
+ criteria from the doctor and epidemiologist include minimum age, maximum age,
68
+ gender, and a possible previous condition.
69
+
70
+ Example:
71
+
72
+ criteria = asyncio.run(target_patients_outreach("Type 2 diabetes screening"))
73
+
74
+ Args:
75
+ target_screening (str): The type of screening task (e.g., "Type 2 diabetes screening").
76
+ config_list_llama (Dict[str, Any]): Configuration for the Llama model.
77
+ config_list_deepseek (Dict[str, Any]): Configuration for the Deepseek model.
78
+ log_fn (callable, optional): Function for logging messages.
79
+ user_proxy_prompt (str, optional): Custom prompt for the user proxy agent.
80
+ epidemiologist_prompt (str, optional): Custom prompt for the epidemiologist agent.
81
+ doctor_critic_prompt (str, optional): Custom prompt for the doctor critic agent.
82
+
83
+ Returns:
84
+ str: The defined criteria for patient outreach.
85
+ """
86
+ llm_config_llama: Dict[str, Any] = {
87
+ "cache_seed": 41,
88
+ "temperature": 0,
89
+ "config_list": config_list_llama,
90
+ "timeout": 120,
91
+ }
92
+
93
+ llm_config_deepseek: Dict[str, Any] = {
94
+ "cache_seed": 42,
95
+ "temperature": 0,
96
+ "config_list": config_list_deepseek,
97
+ "timeout": 120,
98
+ }
99
+
100
+ user_proxy = UserProxyAgent(
101
+ name="User",
102
+ is_termination_msg=lambda x: (
103
+ x.get("content", "") and x.get("content", "").rstrip().endswith("TERMINATE")
104
+ ),
105
+ human_input_mode="NEVER",
106
+ description=user_proxy_prompt, # Use custom prompt
107
+ code_execution_config=False,
108
+ max_consecutive_auto_reply=1,
109
+ )
110
+
111
+ epidemiologist = AssistantAgent(
112
+ name="Epidemiologist",
113
+ system_message=epidemiologist_prompt, # Use custom prompt
114
+ llm_config=llm_config_llama,
115
+ code_execution_config=False,
116
+ is_termination_msg=lambda x: (
117
+ x.get("content", "") and x.get("content", "").rstrip().endswith("TERMINATE")
118
+ ),
119
+ )
120
+
121
+ critic = AssistantAgent(
122
+ name="DoctorCritic",
123
+ system_message=doctor_critic_prompt, # Use custom prompt
124
+ llm_config=llm_config_deepseek,
125
+ human_input_mode="NEVER",
126
+ code_execution_config=False,
127
+ is_termination_msg=lambda x: (
128
+ x.get("content", "") and x.get("content", "").rstrip().endswith("TERMINATE")
129
+ ),
130
+ )
131
+
132
+ groupchat = GroupChat(
133
+ agents=[user_proxy, epidemiologist, critic],
134
+ messages=[]
135
+ )
136
+ manager = GroupChatManager(groupchat=groupchat, llm_config=llm_config_llama)
137
+
138
+ user_proxy.initiate_chat(
139
+ manager,
140
+ message=target_screening,
141
+ )
142
+ if log_fn:
143
+ log_fn("Agent conversation complete.")
144
+ user_proxy.stop_reply_at_receive(manager)
145
+ result = user_proxy.last_message()["content"]
146
+ if log_fn:
147
+ log_fn(f"Criteria result: {result}")
148
+ return result
149
+
150
+ def get_patients_from_criteria(
151
+ patients_file: str,
152
+ min_age: int,
153
+ max_age: int,
154
+ criteria: str,
155
+ gender: str
156
+ ) -> pd.DataFrame:
157
+ """
158
+ Filters patient data from a CSV file based on specified criteria.
159
+
160
+ This function reads patient data from a CSV file and filters it based on
161
+ age range, gender, and a specific condition.
162
+
163
+ Example:
164
+
165
+ filtered_patients = get_patients_from_criteria(
166
+ patients_file="data/patients.csv",
167
+ min_age=40,
168
+ max_age=70,
169
+ criteria="Adenomatous Polyps",
170
+ gender="None"
171
+ )
172
+
173
+ Args:
174
+ patients_file (str): Path to the CSV file containing patient data.
175
+ min_age (int): Minimum age for filtering.
176
+ max_age (int): Maximum age for filtering.
177
+ criteria (str): Condition to filter patients by.
178
+ gender (str, optional): Gender to filter patients by. Defaults to None.
179
+
180
+ Returns:
181
+ pd.DataFrame: A DataFrame containing the filtered patient data.
182
+ """
183
+ required_columns = [
184
+ 'patient_id', 'First Name', 'Last Name', 'Email',
185
+ 'Patient diagnosis summary', 'age', 'gender', 'condition'
186
+ ]
187
+
188
+ # Support both file path (str) and file-like object (e.g., from Streamlit)
189
+ if hasattr(patients_file, "read"):
190
+ # Reset pointer in case it's been read before
191
+ patients_file.seek(0)
192
+ patients_df = pd.read_csv(patients_file)
193
+ else:
194
+ patients_df = pd.read_csv(patients_file)
195
+
196
+ for column in required_columns:
197
+ if column not in patients_df.columns:
198
+ raise ValueError(f"Missing required column: {column}")
199
+
200
+ # Ensure all text is lowercase for case-insensitive matching
201
+ patients_df['condition'] = patients_df['condition'].str.lower()
202
+ criteria = criteria.lower()
203
+
204
+ # Filter by condition matching
205
+ condition_filter = patients_df['condition'].str.contains(criteria, na=False)
206
+
207
+ # Filter by age range
208
+ age_filter = (patients_df['age'] >= min_age) & (patients_df['age'] <= max_age)
209
+
210
+ # Combine filters with OR logic
211
+ combined_filter = age_filter | condition_filter
212
+
213
+ if gender in ['M', 'F']:
214
+ gender_filter = patients_df['gender'].str.upper() == gender.upper()
215
+ combined_filter = combined_filter & gender_filter
216
+
217
+ return patients_df[combined_filter]
218
+
219
+ def register_function(
220
+ assistant: AssistantAgent,
221
+ user_proxy: UserProxyAgent,
222
+ func: Callable,
223
+ name: str,
224
+ description: str
225
+ ) -> None:
226
+ """
227
+ This function allows an assistant agent and a user proxy agent to execute
228
+ a specified function.
229
+
230
+ Example:
231
+ register_function(
232
+ assistant=assistant_agent,
233
+ user_proxy=user_proxy_agent,
234
+ func=my_function,
235
+ name="my_function",
236
+ description="This is a test function."
237
+ )
238
+
239
+ Args:
240
+ assistant (AssistantAgent): The assistant agent to register the function.
241
+ user_proxy (UserProxyAgent): The user proxy agent to register the function.
242
+ func (Callable): The function to register.
243
+ name (str): The name of the function.
244
+ description (str): A description of the function.
245
+ """
246
+
247
+ assistant.register_for_llm(
248
+ name=name,
249
+ description=description
250
+ )(func)
251
+
252
+ user_proxy.register_for_execution(
253
+ name=name
254
+ )(func)
255
+
256
+ return None
257
+
258
+ async def find_patients(
259
+ criteria: str,
260
+ config_list_llama: Dict[str, Any],
261
+ log_fn=None,
262
+ patients_file_path=None # Can be a path or a file-like object
263
+ ) -> pd.DataFrame:
264
+ """
265
+ Finds patients matching specific criteria using agents.
266
+
267
+ This function uses a user proxy agent and a data analyst agent to filter
268
+ patient data based on the provided criteria.
269
+
270
+ Example:
271
+ patients_df = asyncio.run(find_patients(criteria="Patients aged 40 to 70"))
272
+
273
+ Args:
274
+ criteria (str): The criteria for filtering patients.
275
+ config_list_llama (Dict[str, Any]): Configuration for the Llama model.
276
+ log_fn (callable, optional): Function for logging messages.
277
+ patients_file_path: Path to patient data file or file-like object.
278
+
279
+ Returns:
280
+ pd.DataFrame: A DataFrame containing the filtered patient data.
281
+ """
282
+ # Set up a temporary file path for the agent to use
283
+ temp_file_path = None
284
+
285
+ # If we have a file-like object (from Streamlit), save it to a temp file
286
+ if patients_file_path is not None and hasattr(patients_file_path, "read"):
287
+ try:
288
+ # Create data directory if it doesn't exist
289
+ os.makedirs("data", exist_ok=True)
290
+ temp_file_path = os.path.join("data", "temp_patients.csv")
291
+
292
+ # Reset the file pointer and read with pandas
293
+ patients_file_path.seek(0)
294
+ temp_df = pd.read_csv(patients_file_path)
295
+
296
+ # Save to the temp location
297
+ temp_df.to_csv(temp_file_path, index=False)
298
+
299
+ if log_fn:
300
+ log_fn(f"Saved uploaded file to temporary location: {temp_file_path}")
301
+
302
+ # Update the criteria to include the file path
303
+ criteria = f"The patient data is available at {temp_file_path}. " + criteria
304
+ except Exception as e:
305
+ if log_fn:
306
+ log_fn(f"Error preparing patient file: {str(e)}")
307
+ raise
308
+ elif isinstance(patients_file_path, str):
309
+ # It's a regular file path
310
+ temp_file_path = patients_file_path
311
+ criteria = f"The patient data is available at {temp_file_path}. " + criteria
312
+
313
+ # Configure the LLM
314
+ llm_config_llama: Dict[str, Any] = {
315
+ "cache_seed": 43,
316
+ "temperature": 0,
317
+ "config_list": config_list_llama,
318
+ "timeout": 120,
319
+ "tools": []
320
+ }
321
+
322
+ user_proxy = UserProxyAgent(
323
+ name="user_proxy",
324
+ code_execution_config={"last_n_messages": 2, "work_dir": "data/", "use_docker": False},
325
+ is_termination_msg=lambda x: x.get("content", "") and x.get(
326
+ "content", "").rstrip().endswith("TERMINATE"),
327
+ human_input_mode="NEVER",
328
+ llm_config=llm_config_llama,
329
+ # reflect_on_tool_use=True
330
+ )
331
+
332
+ data_analyst = AssistantAgent(
333
+ name="data_analyst",
334
+ code_execution_config={
335
+ "last_n_messages": 2,
336
+ "work_dir": "data/",
337
+ "use_docker": False},
338
+ llm_config=llm_config_llama,
339
+ # reflect_on_tool_use=True
340
+ )
341
+
342
+ register_function(
343
+ data_analyst,
344
+ user_proxy,
345
+ get_patients_from_criteria,
346
+ "get_patients_from_criteria",
347
+ "Extract and filter patient information based on criteria."
348
+ )
349
+ # --- Fix: Properly extract arguments from the agent conversation ---
350
+ arguments = None # Ensure arguments is defined in this scope
351
+
352
+ def user_proxy_reply(message: str):
353
+ nonlocal temp_file_path
354
+ try:
355
+ if "arguments:" in message:
356
+ arguments_str = message.split("arguments:")[1].strip().split("\n")[0]
357
+ args = eval(arguments_str)
358
+
359
+ # Override the file path with our temp file if available
360
+ if temp_file_path:
361
+ args['patients_file'] = temp_file_path
362
+ if log_fn:
363
+ log_fn(f"Using patient data from: {temp_file_path}")
364
+
365
+ return "Tool call received. \nTERMINATE", args
366
+ except Exception as e:
367
+ if log_fn:
368
+ log_fn(f"Error extracting arguments: {e}")
369
+ return f"Error executing function: {str(e)} \nTERMINATE"
370
+ return "Function call not recognized. \nTERMINATE"
371
+
372
+ user_proxy.reply_handler = user_proxy_reply
373
+ if log_fn:
374
+ log_fn(f"Set up reply handler with temp file path: {temp_file_path}")
375
+
376
+ groupchat = GroupChat(agents=[user_proxy, data_analyst], messages=[])
377
+ manager = GroupChatManager(groupchat=groupchat, llm_config=llm_config_llama)
378
+
379
+ chat_output = user_proxy.initiate_chat(data_analyst, message=f"{criteria}")
380
+ user_proxy.stop_reply_at_receive(manager)
381
+ if log_fn:
382
+ log_fn("Agent conversation for patient filtering complete.")
383
+
384
+ # Always extract arguments from chat history after chat
385
+ if chat_output and hasattr(chat_output, "chat_history"):
386
+ chat_history = chat_output.chat_history
387
+ for message in chat_history:
388
+ if "tool_calls" in message:
389
+ tool_calls = message["tool_calls"]
390
+ for tool_call in tool_calls:
391
+ function = tool_call.get("function", {})
392
+ try:
393
+ arguments = json.loads(function.get("arguments", None))
394
+ except Exception:
395
+ arguments = None
396
+ if arguments:
397
+ break
398
+ if arguments:
399
+ break
400
+
401
+ if not arguments:
402
+ if log_fn:
403
+ log_fn("Arguments were not populated during the chat process.")
404
+ raise ValueError("Arguments were not populated during the chat process.")
405
+
406
+ # Always use the temp file path for the actual data load if available
407
+ if temp_file_path and arguments:
408
+ arguments['patients_file'] = temp_file_path
409
+
410
+ filtered_df = get_patients_from_criteria(
411
+ patients_file=arguments['patients_file'],
412
+ min_age=arguments['min_age'],
413
+ max_age=arguments['max_age'],
414
+ criteria=arguments['criteria'],
415
+ gender=arguments['gender']
416
+ )
417
+ if log_fn:
418
+ log_fn(f"Filtered {len(filtered_df)} patients.")
419
+ return filtered_df, arguments
420
+
421
+ async def generate_email(openai_client, patient, email_prompt, model):
422
+ """
423
+ Asynchronously generate an email using the OpenAI client.
424
+
425
+ Args:
426
+ openai_client (OpenAI): The OpenAI client instance.
427
+ patient (dict): The patient data.
428
+ email_prompt (str): The email prompt to send to the model.
429
+ model (str): The model to use for generation.
430
+
431
+ Returns:
432
+ str: The generated email content.
433
+ """
434
+ # Wrap the synchronous `create` method in an async function
435
+ create_completion = functools.partial(
436
+ openai_client.chat.completions.create,
437
+ model=model, # Use model from the OpenAI client
438
+ messages=[{"role": "user", "content": email_prompt}],
439
+ stream=False,
440
+ seed=42,
441
+ temperature=0 # Ensures a consistent output for email (limiting creativity)
442
+ )
443
+ chat_completion = await asyncio.get_event_loop().run_in_executor(None, create_completion)
444
+ return chat_completion.choices[0].message.content
445
+
446
+
447
+ async def write_email_to_file(file_path, patient, email_content):
448
+ """
449
+ Asynchronously write an email to a file.
450
+
451
+ Args:
452
+ file_path (str): The path to the file.
453
+ patient (dict): The patient data.
454
+ email_content (str): The email content to write.
455
+
456
+ Returns:
457
+ None
458
+ """
459
+ async with aiofiles.open(file_path, "w") as f:
460
+ await f.write(f"Name: {patient['First Name']} {patient['Last Name']}\n")
461
+ await f.write(f"Patient ID: {patient['patient_id']}\n")
462
+ await f.write(f"Email: {patient['Email']}\n")
463
+ await f.write(email_content)
464
+ await f.write("\n")
465
+ await f.write("-----------------------------------------")
466
+
467
+
468
+ async def write_outreach_emails(
469
+ patient_details: pd.DataFrame,
470
+ user_proposal: str,
471
+ arguments_criteria: Dict[str, Any],
472
+ openai_client: OpenAI,
473
+ model: str,
474
+ phone: str = "123-456-7890",
475
+ email: str = "[email protected]",
476
+ name: str = "Benjamin Consolvo",
477
+ log_fn=None,
478
+ outreach_email_prompt_template=OUTREACH_EMAIL_PROMPT_TEMPLATE
479
+ ) -> None:
480
+ """
481
+ Asynchronously generates and writes outreach emails for patients.
482
+
483
+ This function generates personalized emails for patients based on their
484
+ details and the specified screening criteria. The emails are written to
485
+ individual text files asynchronously.
486
+
487
+ Args:
488
+ patient_details (pd.DataFrame): DataFrame containing patient details.
489
+ user_proposal (str): The type of screening task (e.g., "Colonoscopy screening").
490
+ arguments_criteria (Dict[str, Any]): The criteria used for filtering patients.
491
+ openai_client (OpenAI): The OpenAI client instance.
492
+ model (str): Model name to use for generation.
493
+ phone (str): Phone number to include in the outreach emails.
494
+ email (str): Email address to include in the outreach emails.
495
+ name (str): Name to include in the outreach emails.
496
+ log_fn (callable, optional): Function for logging messages.
497
+ outreach_email_prompt_template (str): Custom template for outreach emails.
498
+
499
+ Returns:
500
+ None
501
+ """
502
+ os.makedirs("data", exist_ok=True)
503
+ if patient_details.empty:
504
+ msg = "No patients found"
505
+ print(msg)
506
+ if log_fn:
507
+ log_fn(msg)
508
+ return
509
+
510
+ async def process_patient(patient):
511
+ # Ensure all required fields are present in the patient record
512
+ required_fields = ['First Name', 'Last Name', 'patient_id', 'Email']
513
+ for field in required_fields:
514
+ if field not in patient or pd.isna(patient[field]):
515
+ msg = f"Skipping patient record due to missing field: {field}"
516
+ print(msg)
517
+ if log_fn:
518
+ log_fn(msg)
519
+ return
520
+
521
+ # Validate the prompt template
522
+ try:
523
+ # Use the custom template instead of the default
524
+ email_prompt = outreach_email_prompt_template.format(
525
+ patient=patient.to_dict(),
526
+ arguments_criteria=arguments_criteria,
527
+ first_name=patient["First Name"],
528
+ last_name=patient["Last Name"],
529
+ user_proposal=user_proposal,
530
+ name=name,
531
+ phone=phone,
532
+ email=email
533
+ )
534
+ except KeyError as e:
535
+ msg = f"Error formatting email prompt: Missing key {e}. Skipping patient."
536
+ print(msg)
537
+ if log_fn:
538
+ log_fn(msg)
539
+ return
540
+
541
+ msg = f'Generating email for {patient["First Name"]} {patient["Last Name"]}'
542
+ print(msg)
543
+ if log_fn:
544
+ log_fn(msg)
545
+ email_content = await generate_email(openai_client, patient, email_prompt, model)
546
+
547
+ file_path = f"data/{patient['First Name']}_{patient['Last Name']}_email.txt"
548
+ await write_email_to_file(file_path, patient, email_content)
549
+ if log_fn:
550
+ log_fn(f"Wrote email to {file_path}")
551
+
552
+ tasks = [process_patient(patient) for _, patient in patient_details.iterrows()]
553
+ await asyncio.gather(*tasks)
554
+
555
+ msg = f"All emails have been written to the 'data/' directory."
556
+ print(msg)
557
+ if log_fn:
558
+ log_fn(msg)
559
+
560
+ def parse_arguments():
561
+ """
562
+ Parse command-line arguments for the script.
563
+
564
+ Returns:
565
+ argparse.Namespace: Parsed arguments.
566
+ """
567
+ parser = argparse.ArgumentParser(description="Run the Preventative Healthcare Intel script.")
568
+ parser.add_argument(
569
+ "--oai_config",
570
+ type=str,
571
+ required=True,
572
+ help="Path to the OAI_CONFIG_LIST.json file."
573
+ )
574
+ parser.add_argument(
575
+ "--target_screening",
576
+ type=str,
577
+ required=True,
578
+ help="The type of screening task (e.g., 'Colonoscopy screening')."
579
+ )
580
+ parser.add_argument(
581
+ "--patients_file",
582
+ type=str,
583
+ default="data/patients.csv",
584
+ help="Path to the CSV file containing patient data. Default is 'data/patients.csv'."
585
+ )
586
+ parser.add_argument(
587
+ "--phone",
588
+ type=str,
589
+ default="123-456-7890",
590
+ help="Phone number to include in the outreach emails. Default is '123-456-7890'."
591
+ )
592
+ parser.add_argument(
593
+ "--email",
594
+ type=str,
595
+ default="[email protected]",
596
+ help="Email address to include in the outreach emails. Default is '[email protected]'."
597
+ )
598
+ parser.add_argument(
599
+ "--name",
600
+ type=str,
601
+ default="Benjamin Consolvo",
602
+ help="Name to include in the outreach emails. Default is 'Benjamin Consolvo'."
603
+ )
604
+ return parser.parse_args()
605
+
606
+ if __name__ == "__main__":
607
+ # Parse command-line arguments
608
+ args = parse_arguments()
609
+
610
+ llama_filter_dict = {"model": ["meta-llama/Llama-3.3-70B-Instruct"]}
611
+ config_list_llama = get_configs(args.oai_config,llama_filter_dict)
612
+
613
+ deepseek_filter_dict = {"model": ["deepseek-ai/DeepSeek-R1-Distill-Llama-70B"]}
614
+ config_list_deepseek = get_configs(args.oai_config,deepseek_filter_dict)
615
+
616
+ # Validate API key before initializing OpenAI client
617
+ api_key = config_list_llama[0].get('api_key')
618
+
619
+ if not api_key:
620
+ config_list_llama[0]['api_key'] = config_list_deepseek[0]['api_key'] = api_key = os.environ.get("OPENAI_API_KEY")
621
+
622
+ # Get the criteria for the target screening
623
+ # The user provides the screening task.
624
+ # The epidemiologist and doctor critic will then define the criteria for the outreach.
625
+ filepath = os.path.join(os.getcwd(), args.patients_file)
626
+ criteria = f"The patient data is located here: {filepath}."
627
+ criteria += asyncio.run(target_patients_outreach(args.target_screening,config_list_llama, config_list_deepseek))
628
+
629
+ # The user proxy agent and data analyst
630
+ # will filter the patients based on the criteria defined by the epidemiologist and doctor critic.
631
+ patients_df, arguments_criteria = asyncio.run(find_patients(criteria,config_list_llama, patient_data_path=filepath))
632
+
633
+ # Initialize OpenAI client
634
+ openai_client = OpenAI(
635
+ api_key=api_key,
636
+ base_url=config_list_llama[0]['base_url']
637
+ )
638
+
639
+ #Use LLM to write the outreach emails to text files.
640
+ asyncio.run(write_outreach_emails(
641
+ patients_df,
642
+ args.target_screening,
643
+ arguments_criteria,
644
+ openai_client,
645
+ config_list_llama[0]['model'],
646
+ phone=args.phone,
647
+ email=args.email,
648
+ name=args.name
649
+ ))
pyproject.toml ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [project]
2
+ name = "agentchat-intel-preventative-healthcare"
3
+ version = "0.1.0"
4
+ description = "AutoGen Agents for Preventative Healthcare"
5
+ readme = "README.md"
6
+ requires-python = ">=3.10"
7
+ dependencies = [
8
+ "aiofiles>=24.1.0",
9
+ "anyio>=4.9.0",
10
+ "argparse>=1.4.0",
11
+ "asyncio>=3.4.3",
12
+ "autogen>=0.9",
13
+ "autogen-ext[openai]>=0.5.6",
14
+ "distro>=1.9.0",
15
+ "litellm[proxy]>=1.68.0",
16
+ "markitdown>=0.1.1",
17
+ "openai>=1.75.0",
18
+ "pandas>=2.2.3",
19
+ "streamlit>=1.25.0",
20
+ ]
requirements.txt CHANGED
@@ -1,3 +1,12 @@
1
- altair
 
 
 
 
 
2
  pandas
 
 
 
 
3
  streamlit
 
1
+ distro
2
+ autogen
3
+ autogen-ext[openai]
4
+ litellm[proxy]
5
+ anyio
6
+ markitdown
7
  pandas
8
+ aiofiles
9
+ argparse
10
+ openai
11
+ asyncio
12
  streamlit
src/streamlit_app.py DELETED
@@ -1,40 +0,0 @@
1
- import altair as alt
2
- import numpy as np
3
- import pandas as pd
4
- import streamlit as st
5
-
6
- """
7
- # Welcome to Streamlit!
8
-
9
- Edit `/streamlit_app.py` to customize this app to your heart's desire :heart:.
10
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
11
- forums](https://discuss.streamlit.io).
12
-
13
- In the meantime, below is an example of what you can do with just a few lines of code:
14
- """
15
-
16
- num_points = st.slider("Number of points in spiral", 1, 10000, 1100)
17
- num_turns = st.slider("Number of turns in spiral", 1, 300, 31)
18
-
19
- indices = np.linspace(0, 1, num_points)
20
- theta = 2 * np.pi * num_turns * indices
21
- radius = indices
22
-
23
- x = radius * np.cos(theta)
24
- y = radius * np.sin(theta)
25
-
26
- df = pd.DataFrame({
27
- "x": x,
28
- "y": y,
29
- "idx": indices,
30
- "rand": np.random.randn(num_points),
31
- })
32
-
33
- st.altair_chart(alt.Chart(df, height=700, width=700)
34
- .mark_point(filled=True)
35
- .encode(
36
- x=alt.X("x", axis=None),
37
- y=alt.Y("y", axis=None),
38
- color=alt.Color("idx", legend=None, scale=alt.Scale()),
39
- size=alt.Size("rand", legend=None, scale=alt.Scale(range=[1, 150])),
40
- ))