Spaces:

oldcai
/

medrax.org

Runtime error

App Files Files Community

medrax.org / experiments /README.md

oldcai

Upload folder using huggingface_hub

d7a7846 verified 15 days ago

preview code

raw

history blame

2.92 kB

	# Experiments
	Below are the instructions for running experiments using our novel ChestAgentBench and the previous SoTA CheXbench. ChestAgentBench is a comprehensive benchmark containing over 2,500 complex medical queries across 8 diverse categories.

	### ChestAgentBench

	To run gpt-4o on ChestAgentBench, enter the `experiments` directory and run the following script:
	```bash
	python benchmark_gpt4o.py
	```

	To run llama 3.2 vision 90B on ChestAgentBench, run the following:
	```bash
	python benchmark_llama.py
	```

	To run chexagent on ChestAgentBench, run the following:
	```bash
	python benchmark_chexagent.py
	```

	To run llava-med on ChestAgentBench, you'll need to clone their repo and copy the following script into it, after you follow their setup instructions.
	```bash
	mv benchmark_llavamed.py ~/LLaVA-Med/llava/serve
	python -m llava.serve.benchmark_llavamed --model-name llava-med-v1.5-mistral-7b --controller http://localhost:10000
	```

	If you want to inspect the logs, you can run the following. It will select the most recent log file by default.
	```bash
	python inspect_logs.py [optional: log-file] -n [num-logs]
	```

	Finally, to analyze results, run:
	```bash
	python analyze_axes.py results/[logfile].json ../benchmark/questions/ --model [gpt4\|llama\|chexagent\|llava-med] --max-questions [optional:int]
	```

	### CheXbench

	To run the models on chexbench, you can use `chexbench_gpt4.py` as a reference. You'll need to download the dataset files locally, and upload them for each request. Rad-ReStruct and Open-I use the same set of images, so you can download the `NLMCXR.zip` file just once and copy the images to both directories.

	You can find the datasets here:
	1. [SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical Visual Question Answering](https://www.med-vqa.com/slake/). Save this to `MedMAX/data/slake`.
	2. [Rad-ReStruct: A Novel VQA Benchmark and Method for Structured Radiology Reporting](https://github.com/ChantalMP/Rad-ReStruct). Save the images to `MedMAX/data/rad-restruct/images`.
	3. [Open-I Service of the National Library of Medicine](https://openi.nlm.nih.gov/faq). Save the images to `MedMAX/data/openi/images`.

	Once you're finished, you'll want to fix the paths in the `chexbench.json` file to your local paths using the `MedMax/data/fix_chexbench.py` script.


	### Compare Runs
	Analyze a single file based on overall accuracy and along different axes
	```
	python compare_runs.py results/medmax.json
	```

	For a direct evaluation comparing 2 models, on the exact same questions
	```
	python compare_runs.py results/medmax.json results/gpt4o.json
	```

	For a direct evaluation comparing ALL models, on the exact same questions (add as many model log files as you want).
	```
	python compare_runs.py results/medmax.json results/gpt4o.json results/llama.json results/chexagent.json results/llavamed.json
	```