---
language:
- en
library_name: transformers
tags:
- jiviai
- medical ai
widget:
- example_title: X-ray Report Generation
  messages:
  - role: user
    content:
    - type: text
      text: Generate an xray report for the above xray.
    - type: image_url
      image_url:
        url: >-
          https://jarvis-01j48hrq5383vpdk8csp3r60xa.s3.amazonaws.com/dev/MISC/2024-10-04/01J9BCTFZBH1AMD246G55DCY93.jpeg
  - role: assistant
    content: |
      Patient: [Name not provided]
      Gender: Female
      Age: [Age not provided]
      Clinical History: [Not provided]
      View: PA (Posterior-Anterior)
      Image Quality: The image is of good quality. The patient positioning is adequate, and the exposure is appropriate for interpretation.
      Findings:
          * The lung fields are clear bilaterally, with no evidence of consolidation, masses, or infiltrates.
          * The heart size is within normal limits, and the cardiac silhouette is unremarkable.
          * The diaphragm and costophrenic angles are clear.
          * A pacemaker and leads are present in the right upper chest, with the pacemaker generator in the left upper chest.
          * No acute bony abnormalities are identified.
          * No pneumothorax, pleural effusion, or consolidation is seen.
      
      Conclusion:
          * No acute cardiopulmonary abnormality is detected.
          * The presence of a pacemaker and leads suggests a history of cardiac rhythm management.
          * Clinical correlation is advised for further management.
      
pipeline_tag: image-text-to-text
---


# Jivi-RadX: Large Language Vision Assistant for Xrays
Introducing **Jivi-RadX-v1**, an advanced visual language model specifically designed for sophisticated image reasoning in the healthcare domain. With its robust capabilities, this model excels at interpreting radiographic X-ray images, offering accurate and insightful responses to a wide range of diagnostic and analytical questions. Whether it's aiding clinicians in making informed decisions or assisting researchers in understanding complex image data, Jivi-RadX-v1
pushes the boundaries of medical imaging analysis.

<img src="https://jarvis-01j48hrq5383vpdk8csp3r60xa.s3.amazonaws.com/dev/MISC/2024-10-03/01J98VPJTRYQSKFEB888Q9QDTQ.webp" alt="Jivi-RadX-v1" width="600" height="600">


**Model Architecture:** `Jivi-RadX-v1` is built on top of Llama 3.1 text-only model, which is an auto-regressive language model that uses an optimized transformer architecture. To support image recognition tasks, we use a separately trained vision encoder and a vision projector that integrates with our base language model.

## Benchmarks

We have released our x-ray benchmark [jivi_chexnet](https://huggingface.co/datasets/jiviai/jivi_chexnet) which combines rich and diverse x-ray images along with verified question and answer on them. Below is the comparision of our model with other open-source and closed-source vision LLMs.


| Model            | CheXpert (Stanford dataset) | NIH Chest X-Ray (NIH UK Dataset) | Overall Accuracy |
|------------------|-----------------------------|----------------------------------|------------------|
| **Jivi AI (Jivi-RadX)** | **85%**                    | **64.5%**                         | **75%**           |
| Open AI (GPT-4o) | 81.5%                       | 66.3%                            | 74%              |
| Stanford AIMI (ChexAgent-8b) | 55.6%              | 50.3%                            | 53%              |
| Microsoft (Llava-Med) | 51%                     | 46.6%                            | 49%              |
| *Google (Gemini 1.5 Pro) | 36%                   | 58%  | 47% |       

*Gemini 1.5 Pro occasionally declines to respond to medical questions, which affects its overall accuracy.

*Benchmark numbers were calculated using [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval) by *LMMs-Lab*.


## Training process:
<img src="https://jarvis-01j48hrq5383vpdk8csp3r60xa.s3.amazonaws.com/dev/MISC/2024-10-03/01J99R9KTP0YSDRKEKDJSFDZZ2.png" style='border:0.5px solid'  alt="Jivi-RadX-v1 steps" height="600">

## Training Data

`Jivi-RadX-v1` was pretrained on 365k medical image and text pairs. The instruction tuning data includes over 280k synthetically generated examples.

### Synthetic Data Generation
We leveraged various closed-source and open-source visual LLMs and used the metadata of the x-ray images to generate rich captions for training.

## How to use

### Use with transformers
Please ensure `transformers>=4.45.2`

```python
import requests
import torch
from PIL import Image
from transformers import (AutoProcessor, AutoTokenizer,
                          LlavaForConditionalGeneration)

conversation = [
    {"role": "system", "content": "You a helpful AI assistant."},
    {
        "role": "user",
        "content": "<image>\n Please describe this x-ray.",
    },
]

model_id = "jiviai/Jivi-RadX-v1"

tokenizer = AutoTokenizer.from_pretrained(model_id)
processor = AutoProcessor.from_pretrained(model_id)
model = LlavaForConditionalGeneration.from_pretrained(
    model_id, attn_implementation="eager", device_map="cuda", torch_dtype=torch.float16
)


prompt = tokenizer.apply_chat_template(
    conversation, tokenize=False, add_generation_prompt=True
)

url = "https://jarvis-01j48hrq5383vpdk8csp3r60xa.s3.amazonaws.com/dev/MISC/2024-10-03/01J991DRQ2G5TAB24A9QNMFAXN.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(text=prompt, images=image, return_tensors="pt").to(
    model.device, dtype=model.dtype
)
# Generate
generate_ids = model.generate(**inputs, max_new_tokens=30)
output = processor.decode(
    generate_ids[0], skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output)

```

**Supported Languages:** Currently we only support english. We are planning to introduce multi-lingual support shortly. 

**Feedback:** To send any feedback/questions please use the community section of the model.

## Intended use
The data, code, and model checkpoints are intended to be used solely for:
1. Future research on visual-language processing.
2. Reproducibility of the experimental results reported in the reference paper.

### Disclaimer: The data, code, and model checkpoints are not intended to be used in clinical care or for any clinical decision making purposes.