Spaces:
Runtime error
Runtime error
# Experiments | |
Below are the instructions for running experiments using our novel ChestAgentBench and the previous SoTA CheXbench. ChestAgentBench is a comprehensive benchmark containing over 2,500 complex medical queries across 8 diverse categories. | |
### ChestAgentBench | |
To run gpt-4o on ChestAgentBench, enter the `experiments` directory and run the following script: | |
```bash | |
python benchmark_gpt4o.py | |
``` | |
To run llama 3.2 vision 90B on ChestAgentBench, run the following: | |
```bash | |
python benchmark_llama.py | |
``` | |
To run chexagent on ChestAgentBench, run the following: | |
```bash | |
python benchmark_chexagent.py | |
``` | |
To run llava-med on ChestAgentBench, you'll need to clone their repo and copy the following script into it, after you follow their setup instructions. | |
```bash | |
mv benchmark_llavamed.py ~/LLaVA-Med/llava/serve | |
python -m llava.serve.benchmark_llavamed --model-name llava-med-v1.5-mistral-7b --controller http://localhost:10000 | |
``` | |
If you want to inspect the logs, you can run the following. It will select the most recent log file by default. | |
```bash | |
python inspect_logs.py [optional: log-file] -n [num-logs] | |
``` | |
Finally, to analyze results, run: | |
```bash | |
python analyze_axes.py results/[logfile].json ../benchmark/questions/ --model [gpt4|llama|chexagent|llava-med] --max-questions [optional:int] | |
``` | |
### CheXbench | |
To run the models on chexbench, you can use `chexbench_gpt4.py` as a reference. You'll need to download the dataset files locally, and upload them for each request. Rad-ReStruct and Open-I use the same set of images, so you can download the `NLMCXR.zip` file just once and copy the images to both directories. | |
You can find the datasets here: | |
1. [SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical Visual Question Answering](https://www.med-vqa.com/slake/). Save this to `MedMAX/data/slake`. | |
2. [Rad-ReStruct: A Novel VQA Benchmark and Method for Structured Radiology Reporting](https://github.com/ChantalMP/Rad-ReStruct). Save the images to `MedMAX/data/rad-restruct/images`. | |
3. [Open-I Service of the National Library of Medicine](https://openi.nlm.nih.gov/faq). Save the images to `MedMAX/data/openi/images`. | |
Once you're finished, you'll want to fix the paths in the `chexbench.json` file to your local paths using the `MedMax/data/fix_chexbench.py` script. | |
### Compare Runs | |
Analyze a single file based on overall accuracy and along different axes | |
``` | |
python compare_runs.py results/medmax.json | |
``` | |
For a direct evaluation comparing **2** models, on the exact same questions | |
``` | |
python compare_runs.py results/medmax.json results/gpt4o.json | |
``` | |
For a direct evaluation comparing **ALL** models, on the exact same questions (add as many model log files as you want). | |
``` | |
python compare_runs.py results/medmax.json results/gpt4o.json results/llama.json results/chexagent.json results/llavamed.json | |
``` | |