Jivi-RadX-v1 / README.md

Update README.md

7a057b3 verified 2 months ago

6.35 kB

	---
	language:
	- en
	library_name: transformers
	tags:
	- jiviai
	- medical ai
	widget:
	- example_title: X-ray Report Generation
	messages:
	- role: user
	content:
	- type: text
	text: Generate an xray report for the above xray.
	- type: image_url
	image_url:
	url: >-
	https://jarvis-01j48hrq5383vpdk8csp3r60xa.s3.amazonaws.com/dev/MISC/2024-10-04/01J9BCTFZBH1AMD246G55DCY93.jpeg
	- role: assistant
	content: \|
	Patient: [Name not provided]
	Gender: Female
	Age: [Age not provided]
	Clinical History: [Not provided]
	View: PA (Posterior-Anterior)
	Image Quality: The image is of good quality. The patient positioning is adequate, and the exposure is appropriate for interpretation.
	Findings:
	* The lung fields are clear bilaterally, with no evidence of consolidation, masses, or infiltrates.
	* The heart size is within normal limits, and the cardiac silhouette is unremarkable.
	* The diaphragm and costophrenic angles are clear.
	* A pacemaker and leads are present in the right upper chest, with the pacemaker generator in the left upper chest.
	* No acute bony abnormalities are identified.
	* No pneumothorax, pleural effusion, or consolidation is seen.

	Conclusion:
	* No acute cardiopulmonary abnormality is detected.
	* The presence of a pacemaker and leads suggests a history of cardiac rhythm management.
	* Clinical correlation is advised for further management.

	pipeline_tag: image-text-to-text
	---


	# Jivi-RadX: Large Language Vision Assistant for Xrays
	Introducing Jivi-RadX-v1, an advanced visual language model specifically designed for sophisticated image reasoning in the healthcare domain. With its robust capabilities, this model excels at interpreting radiographic X-ray images, offering accurate and insightful responses to a wide range of diagnostic and analytical questions. Whether it's aiding clinicians in making informed decisions or assisting researchers in understanding complex image data, Jivi-RadX-v1
	pushes the boundaries of medical imaging analysis.

	<img src="https://jarvis-01j48hrq5383vpdk8csp3r60xa.s3.amazonaws.com/dev/MISC/2024-10-03/01J98VPJTRYQSKFEB888Q9QDTQ.webp" alt="Jivi-RadX-v1" width="600" height="600">


	Model Architecture: `Jivi-RadX-v1` is built on top of Llama 3.1 text-only model, which is an auto-regressive language model that uses an optimized transformer architecture. To support image recognition tasks, we use a separately trained vision encoder and a vision projector that integrates with our base language model.

	## Benchmarks

	We have released our x-ray benchmark [jivi_chexnet](https://huggingface.co/datasets/jiviai/jivi_chexnet) which combines rich and diverse x-ray images along with verified question and answer on them. Below is the comparision of our model with other open-source and closed-source vision LLMs.



	\| Model \| CheXpert (Stanford dataset) \| NIH Chest X-Ray (NIH UK Dataset) \| Overall Accuracy \|
	\|------------------\|-----------------------------\|----------------------------------\|------------------\|
	\| Jivi AI (Jivi-RadX) \| 85% \| 64.5% \| 75% \|
	\| Open AI (GPT-4o) \| 81.5% \| 66.3% \| 74% \|
	\| Stanford AIMI (ChexAgent-8b) \| 55.6% \| 50.3% \| 53% \|
	\| Microsoft (Llava-Med) \| 51% \| 46.6% \| 49% \|
	\| *Google (Gemini 1.5 Pro) \| 36% \| 58% \| 47% \|

	*Gemini 1.5 Pro occasionally declines to respond to medical questions, which affects its overall accuracy.

	Benchmark numbers were calculated using [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval) by LMMs-Lab*.


	## Training process:
	<img src="https://jarvis-01j48hrq5383vpdk8csp3r60xa.s3.amazonaws.com/dev/MISC/2024-10-03/01J99R9KTP0YSDRKEKDJSFDZZ2.png" style='border:0.5px solid' alt="Jivi-RadX-v1 steps" height="600">

	## Training Data

	`Jivi-RadX-v1` was pretrained on 365k medical image and text pairs. The instruction tuning data includes over 280k synthetically generated examples.

	### Synthetic Data Generation
	We leveraged various closed-source and open-source visual LLMs and used the metadata of the x-ray images to generate rich captions for training.

	## How to use

	### Use with transformers
	Please ensure `transformers>=4.45.2`

	```python
	import requests
	import torch
	from PIL import Image
	from transformers import (AutoProcessor, AutoTokenizer,
	LlavaForConditionalGeneration)

	conversation = [
	{"role": "system", "content": "You a helpful AI assistant."},
	{
	"role": "user",
	"content": "<image>\n Please describe this x-ray.",
	},
	]

	model_id = "jiviai/Jivi-RadX-v1"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	processor = AutoProcessor.from_pretrained(model_id)
	model = LlavaForConditionalGeneration.from_pretrained(
	model_id, attn_implementation="eager", device_map="cuda", torch_dtype=torch.float16
	)


	prompt = tokenizer.apply_chat_template(
	conversation, tokenize=False, add_generation_prompt=True
	)

	url = "https://jarvis-01j48hrq5383vpdk8csp3r60xa.s3.amazonaws.com/dev/MISC/2024-10-03/01J991DRQ2G5TAB24A9QNMFAXN.jpg"
	image = Image.open(requests.get(url, stream=True).raw)
	inputs = processor(text=prompt, images=image, return_tensors="pt").to(
	model.device, dtype=model.dtype
	)
	# Generate
	generate_ids = model.generate(**inputs, max_new_tokens=30)
	output = processor.decode(
	generate_ids[0], skip_special_tokens=True, clean_up_tokenization_spaces=False
	)
	print(output)

	```

	Supported Languages: Currently we only support english. We are planning to introduce multi-lingual support shortly.

	Feedback: To send any feedback/questions please use the community section of the model.

	## Intended use
	The data, code, and model checkpoints are intended to be used solely for:
	1. Future research on visual-language processing.
	2. Reproducibility of the experimental results reported in the reference paper.

	### Disclaimer: The data, code, and model checkpoints are not intended to be used in clinical care or for any clinical decision making purposes.