Benjamin Consolvo commited on
Commit
ec4cdd8
·
1 Parent(s): f6e3b3b

readme updates

Browse files
Files changed (1) hide show
  1. README.md +31 -30
README.md CHANGED
@@ -14,25 +14,26 @@ short_description: Using AI agents for preventative healthcare maintenance
14
 
15
  ## AutoGen Multi-Agent Chat Preventative Healthcare
16
 
17
- This is a multi-agent system built on top of AutoGen agents designed to automate and optimize preventative healthcare outreach. It uses multiple agents, large language models (LLMs) and asynchronous programming to streamline the process of identifying patients who meet specific screening criteria, filter patient data, and generate personalized outreach emails.
18
 
19
- The system uses model endpoints hosted by [Denvr Dataworks](https://www.denvrdata.com/intel) on Intel® Gaudi® accelerators, and an OpenAI-compatible API key.
20
 
21
  Credit: Though heavily modified, the original idea comes from Mike Lynch on his [Medium blog](https://medium.com/@micklynch_6905/hospitalgpt-managing-a-patient-population-with-autogen-powered-by-gpt-4-mixtral-8x7b-ef9f54f275f1).
22
 
23
- ### Workflow:
24
 
25
  <p align="center">
26
  <img width="700" src="images/prev_healthcare_4.drawio.svg">
27
  </p>
28
 
29
- 1. **Define Screening Criteria**: After getting the general screening task from the user, the User Proxy Agent starts a conversation between the Epidemiologist Agent and the Doctor Critic Agent to define the criteria for patient outreach based on the target screening type. The output criteria is age range (e.g., 40–70), gender, and relevant medical history.
30
 
31
- 2. **Filter Patients**: The Data Analyst Agent filters patient data from a CSV file based on the defined criteria, including age range, gender, and medical conditions. The patient data are synthetically generated. You can find the sample data under [data/patients.csv](https://huggingface.co/spaces/Intel/preventative_healthcare/blob/main/data/patients.csv).
32
 
33
- 3. **Generate Outreach Emails**: The program generates outreach emails for the filtered patients using LLMs and saves them as text files.
34
 
35
- ### Setup
 
36
 
37
  If you want a local copy of the application to run, you can clone the repository and then navigate into the folder with:
38
 
@@ -41,7 +42,6 @@ git clone https://huggingface.co/spaces/Intel/preventative_healthcare
41
  cd preventative_healthcare
42
  ```
43
 
44
-
45
  You can use the `uv` package to manage your virtual environment and dependencies. Just initialize the `uv` project, and create the virtual environment:
46
 
47
  ```bash
@@ -66,47 +66,46 @@ deactivate
66
 
67
  ### OpenAI API Key, Model Name, and Endpoint URL
68
 
69
-
70
- 1. If using the Hugging Face Spaces app, you can add your OpenAI-compatible API key and the model endpoint URL to the [Hugging Face Settings](https://huggingface.co/spaces/Intel/preventative_healthcare/settings) under "Variables and secrets". They are called by a function called `st.secrets` [here in the app.py code](https://huggingface.co/spaces/Intel/preventative_healthcare/blob/main/app.py#L295).
71
 
72
  2. If deploying a local version with Streamlit frontend, you can add your details to a file under `.streamlit/secrets.toml` that looks like this:
73
  ```bash
74
  OPENAI_API_KEY = "your-api-key"
75
  OPENAI_BASE_URL = "https://api.inference.denvrdata.com/v1/"
76
  ```
77
- 3. Finally, if you just want to use the Python script without any Streamlit front-end, you can just add your API key to the [OAI_CONFIG_LIST.json](https://huggingface.co/spaces/Intel/preventative_healthcare/blob/main/OAI_CONFIG_LIST.json) file. Just don't expose your precious API key to the world! You can modify the `api_key`, `model` and `base_url` to the model name and endpoint URL that you are using. This file should look like:
78
  ```json
79
  [
80
  {
81
  "model": "meta-llama/Llama-3.3-70B-Instruct",
82
  "base_url": "https://api.inference.denvrdata.com/v1/",
83
- "api_key": "",
84
  "price": [0.0, 0.0]
85
  },
86
  {
87
  "model": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
88
  "base_url": "https://api.inference.denvrdata.com/v1/",
89
- "api_key": "",
90
  "price": [0.0, 0.0]
91
  }
92
  ]
93
  ```
94
 
95
  ### Modifying prompts
96
- To modify prompts, you can edit them in the UI on the left sidebar, or you can edit the following files:
97
 
98
  1. User proxy agent: the agent responsible for passing along the user's preventative healthcare task to the other agents.
99
- [prompts/user_proxy_prompt.py](https://huggingface.co/spaces/Intel/preventative_healthcare/blob/main/prompts/user_proxy_prompt.py)
100
  2. Epidemiologist agent: The disease specialist agent who will gather the preventative healthcare task and decide on patient criteria.
101
- [prompts/epidemiologist_prompt.py](https://huggingface.co/spaces/Intel/preventative_healthcare/blob/main/prompts/outreach_email_prompt.py)
102
  3. Doctor Critic agent: The doctor critic agent reviews the criteria from the epidemiologist and passes this along. The output will be used to filter actual patients from the patient data.
103
- [prompts/doctor_critic_prompt.py](https://huggingface.co/spaces/Intel/preventative_healthcare/blob/main/prompts/doctor_critic_prompt.py)
104
  4. Outreach email: This is not an agent, but still uses an LLM to build the outreach email.
105
- [prompts/outreach_email_prompt.py](https://huggingface.co/spaces/Intel/preventative_healthcare/blob/main/prompts/outreach_email_prompt.py)
106
 
107
  ### Example Usage
108
 
109
- If you want to run the app with streamlit, you can run locally with:
110
 
111
  ```bash
112
  streamlit run app.py
@@ -133,28 +132,30 @@ The arguments are defined as follows:
133
  - `--email`: Reply email address to include in the outreach emails. Default is `[email protected]`.
134
  - `--name`: Name to include in the outreach emails. Default is `Benjamin Consolvo`.
135
 
136
- This will process the patient data, filter based on the specified criteria, and generate outreach emails for the patients. The emails will be saved as text files in the `data/` directory.
137
 
138
  ### 6 Lessons Learned
139
 
140
- 1. Some LLMs perform better than others at certain tasks. While this may seem obvious, in practice, you often need to adjust which LLMs you use after seeing the results. In my case, I found that [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) model was much more consistent and hallucinated less than [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) for email generation.
141
- 2. Setting temperature to 0 is important for getting a consistent output response from LLMs. In my use-case, I ended up setting this creativity level to 0 across all models.
142
- 3. Prompt engineering is very important in the age of instructing LLMs on what to do. My top 3 tips:
 
 
143
  - Be specific and detailed
144
  - Give exact output format examples
145
  - Tell the LLM what to do, rather than telling it everything it should not do
 
146
 
147
- You can read more about prompt engineering on [OpenAI's blog here](https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-the-openai-api)
148
-
149
- 4. Certain tasks are easier to manage with traditional programming rather than building an agent to do it. In the case of getting data consistently from a database with a specified format, write a function rather than building an agent. The LLM may hallucinate and not carry out the task correctly. I implemented this after fighting with the agents in a function called [get_patients_from_criteria](https://huggingface.co/spaces/Intel/preventative_healthcare/blob/main/intelpreventativehealthcare.py#L150). When I started this project, the LLMs were inventing data that were not a part of the database, even though I clearly instructed the agent to only use data from the database! To resolve this, I made sure that the agent was using a specific function to read from the database with a tool-call.
150
  5. Do operations asynchronously wherever possible. Instead of writing emails one by one in a for loop, write them all at once with `async`.
151
  6. Code writing tools like GitHub Copilot, Cursor, and Windsurf can save a lot of time, but you still need to pay attention to the output and understand what is going on with the code. A lot of unecessary lines of code and technical debt will be accumulated by relying purely on code generation tools.
152
 
153
- ### Follow Up
154
 
155
- Connect to LLMs on Intel® Gaudi® accelerators with just an endpoint and an OpenAI-compatible API key, courtesy of cloud-provider Denvr Dataworks: https://www.denvrdata.com/intel
156
 
157
- Chat with 6K+ fellow developers on the Intel DevHub Discord: https://discord.gg/kfJ3NKEw5t
158
 
159
- Connect with me on LinkedIn: https://linkedin.com/in/bconsolvo
160
 
 
 
14
 
15
  ## AutoGen Multi-Agent Chat Preventative Healthcare
16
 
17
+ This is a multi-agent system built on top of [AutoGen](https://github.com/microsoft/autogen) agents designed to automate and optimize preventative healthcare outreach. It uses multiple agents, large language models (LLMs) and asynchronous programming to streamline the process of identifying patients who meet specific screening criteria and generating personalized outreach emails.
18
 
19
+ The system uses an OpenAI-compatible API key and model endpoints with the inference service called [Intel® AI for Enterprise Inference](https://github.com/opea-project/Enterprise-Inference), powered by Intel® Gaudi® AI accelerators.
20
 
21
  Credit: Though heavily modified, the original idea comes from Mike Lynch on his [Medium blog](https://medium.com/@micklynch_6905/hospitalgpt-managing-a-patient-population-with-autogen-powered-by-gpt-4-mixtral-8x7b-ef9f54f275f1).
22
 
23
+ ## Workflow:
24
 
25
  <p align="center">
26
  <img width="700" src="images/prev_healthcare_4.drawio.svg">
27
  </p>
28
 
29
+ 1. **Define screening criteria**: After getting the general screening task from the user, the User Proxy Agent starts a conversation between the Epidemiologist Agent and the Doctor Critic Agent to define the criteria for patient outreach based on the target screening type. The output criteria is age range (e.g., 40–70), gender, and relevant medical history.
30
 
31
+ 2. **Select and identify patients based on the screening criteria**: The Data Analyst Agent filters patient data from a CSV file based on the defined criteria, including age range, gender, and medical conditions. The patient data were synthetically generated. You can find the sample data under [data/patients.csv](data/patients.csv).
32
 
33
+ 3. **Generate outreach emails**: The program generates outreach emails for the filtered patients using LLMs and saves them as text files.
34
 
35
+ ## Setup
36
+ If you want to host the application on Hugging Face Spaces, the easiest way is to duplicate the Hugging Face Space, and set up your own API secrets as detailed further below.
37
 
38
  If you want a local copy of the application to run, you can clone the repository and then navigate into the folder with:
39
 
 
42
  cd preventative_healthcare
43
  ```
44
 
 
45
  You can use the `uv` package to manage your virtual environment and dependencies. Just initialize the `uv` project, and create the virtual environment:
46
 
47
  ```bash
 
66
 
67
  ### OpenAI API Key, Model Name, and Endpoint URL
68
 
69
+ 1. If using the Hugging Face Spaces app, you can add your OpenAI-compatible API key and the model endpoint URL to the Hugging Face Settings under "Variables and secrets". They are called by a function called `st.secrets` [here in the app.py code](https://huggingface.co/spaces/Intel/preventative_healthcare/blob/main/app.py#L295).
 
70
 
71
  2. If deploying a local version with Streamlit frontend, you can add your details to a file under `.streamlit/secrets.toml` that looks like this:
72
  ```bash
73
  OPENAI_API_KEY = "your-api-key"
74
  OPENAI_BASE_URL = "https://api.inference.denvrdata.com/v1/"
75
  ```
76
+ 3. Finally, if you just want to use the Python script without any front-end interface, you can just add your API key to the [OAI_CONFIG_LIST.json](https://huggingface.co/spaces/Intel/preventative_healthcare/blob/main/OAI_CONFIG_LIST.json) file. Just don't expose your precious API key to the world! You can modify the `api_key`, `model` and `base_url` to the model name and endpoint URL that you are using. This file should look like:
77
  ```json
78
  [
79
  {
80
  "model": "meta-llama/Llama-3.3-70B-Instruct",
81
  "base_url": "https://api.inference.denvrdata.com/v1/",
82
+ "api_key": "openai_key",
83
  "price": [0.0, 0.0]
84
  },
85
  {
86
  "model": "deepseek-ai/DeepSeek-R1-Distill-Llama-70B",
87
  "base_url": "https://api.inference.denvrdata.com/v1/",
88
+ "api_key": "openai_key",
89
  "price": [0.0, 0.0]
90
  }
91
  ]
92
  ```
93
 
94
  ### Modifying prompts
95
+ To modify prompts, you can edit them in the UI on the left sidebar, or you can edit them in the following files:
96
 
97
  1. User proxy agent: the agent responsible for passing along the user's preventative healthcare task to the other agents.
98
+ [prompts/user_proxy_prompt.py](prompts/user_proxy_prompt.py)
99
  2. Epidemiologist agent: The disease specialist agent who will gather the preventative healthcare task and decide on patient criteria.
100
+ [prompts/epidemiologist_prompt.py](prompts/outreach_email_prompt.py)
101
  3. Doctor Critic agent: The doctor critic agent reviews the criteria from the epidemiologist and passes this along. The output will be used to filter actual patients from the patient data.
102
+ [prompts/doctor_critic_prompt.py](prompts/doctor_critic_prompt.py)
103
  4. Outreach email: This is not an agent, but still uses an LLM to build the outreach email.
104
+ [prompts/outreach_email_prompt.py](prompts/outreach_email_prompt.py)
105
 
106
  ### Example Usage
107
 
108
+ If you want to run the app with streamlit, you can run it locally with:
109
 
110
  ```bash
111
  streamlit run app.py
 
132
  - `--email`: Reply email address to include in the outreach emails. Default is `[email protected]`.
133
  - `--name`: Name to include in the outreach emails. Default is `Benjamin Consolvo`.
134
 
135
+ The output emails will be saved as text files in the `data/` directory.
136
 
137
  ### 6 Lessons Learned
138
 
139
+ Here are some lessons learned while building this preventative healthcare agentic application:
140
+
141
+ 1. Some LLMs perform better than others at certain tasks. While this may seem obvious, in practice, you often need to adjust which LLMs you use after seeing the results. For example, the [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) model was much more consistent and hallucinated less than [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) for email generation.
142
+ 2. Setting temperature to 0 is important for getting a consistent output response from LLMs.
143
+ 3. Prompt engineering is very important in the age of instructing LLMs on what to do.
144
  - Be specific and detailed
145
  - Give exact output format examples
146
  - Tell the LLM what to do, rather than telling it everything it should not do
147
+ You can read more about prompt engineering on [OpenAI's blog here](https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-the-openai-api)
148
 
149
+ 4. Certain tasks are easier to manage with traditional programming rather than building an agent to do it. In the case of getting data consistently from a database with a specified format, write a function rather than building an agent. The LLM may hallucinate and not carry out the task correctly. There is a function built in the code here called [get_patients_from_criteria](intelpreventativehealthcare.py#L150) that filters patient data from a CSV file based on specified criteria. LLMs can hallucinate and invent data that are not a part of the database, even when given specific instructions to only use data from the database! You will need to assess when to tell the agent when to use specific tools.
 
 
150
  5. Do operations asynchronously wherever possible. Instead of writing emails one by one in a for loop, write them all at once with `async`.
151
  6. Code writing tools like GitHub Copilot, Cursor, and Windsurf can save a lot of time, but you still need to pay attention to the output and understand what is going on with the code. A lot of unecessary lines of code and technical debt will be accumulated by relying purely on code generation tools.
152
 
153
+ ## Follow Up
154
 
155
+ Connect to LLMs on Intel Gaudi AI accelerators with just an endpoint and an OpenAI-compatible API key, using the inference endpoint [Intel® AI for Enterprise Inference](https://github.com/opea-project/Enterprise-Inference), powered by OPEA. At the time of writing, the endpoint is available on cloud provider [Denvr Dataworks](https://www.denvrdata.com/intel).
156
 
157
+ Chat with 6K+ fellow developers on the [Intel DevHub Discord](https://discord.gg/kfJ3NKEw5t).
158
 
159
+ Follow [Intel Software on LinkedIn](https://www.linkedin.com/showcase/intel-software/).
160
 
161
+ For more Intel AI developer resources, see [developer.intel.com/ai](https://developer.intel.com/ai).