mgbam commited on
Commit
a9c1bd3
·
verified ·
1 Parent(s): c2616df

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +87 -77
README.md CHANGED
@@ -11,7 +11,7 @@ short_description: Medical Image diagnostic
11
  license: mit
12
  ---
13
  RadVision AI Advanced
14
- RadVision AI Advanced is a cutting-edge, Streamlit-based medical imaging analysis application designed to assist clinicians and researchers with rapid, AI-powered interpretation of both DICOM and standard image formats. The tool integrates advanced image processing, region-of-interest (ROI) selection, and multiple AI services—including language models for analysis, Q&A, and disease-specific evaluations—to generate detailed reports and insights on medical images.
15
 
16
  Table of Contents
17
  Overview
@@ -32,146 +32,156 @@ Contributing
32
 
33
  License
34
 
 
 
35
  Overview
36
- RadVision AI Advanced leverages state-of-the-art AI models to process and analyze medical images. The application supports:
37
 
38
- DICOM and Standard Images: Automatically detects and processes both DICOM and common image file formats (JPG, PNG).
39
 
40
- ROI Selection: Users can draw on images to define regions of interest using an integrated drawable canvas.
41
 
42
- Multi-Modal AI Analysis: Provides initial analyses, Q&A interactions, disease-specific evaluations, and confidence estimations.
43
 
44
- PDF Report Generation: Summarizes analysis results in a downloadable PDF report.
45
 
46
- The app is designed for research and educational purposes, and its outputs should be verified by clinical experts.
47
 
48
- Features
49
- Image Processing:
50
 
51
- DICOM parsing and metadata extraction.
 
 
52
 
53
- Window/Level adjustment with interactive sliders.
54
 
55
- Standard image processing using the Python Imaging Library (PIL).
56
 
57
- AI Integration:
 
58
 
59
- Initial analysis to describe and interpret the image.
60
 
61
- Q&A interface for detailed inquiries.
62
 
63
- Condition/disease-specific analysis.
64
 
65
- Confidence estimation on AI outputs.
66
 
67
- Fallback mechanisms using Hugging Face’s VQA models when primary methods fail.
 
68
 
69
- Reporting:
70
 
71
- Generation of PDF reports that include embedded images, session IDs, and formatted analysis results.
72
 
73
- User Interface:
 
74
 
75
- Streamlit-based UI with a clean two-column layout:
 
76
 
77
- Left: Image viewer and ROI selection.
78
 
79
- Right: Analysis results and interactive controls.
80
 
81
  File Structure
82
- graphql
83
- Copy
84
  ├── app.py # Main Streamlit application entry point.
85
- ├── dicom_utils.py # Functions to parse DICOM files, extract metadata, and convert images.
86
- ├── hf_models.py # Integration with Hugging Face Inference API for VQA fallback.
87
- ├── llm_interactions.py # Functions to interact with Gemini (and other LLMs) for analysis, Q&A, and more.
88
- ├── report_utils.py # Functions to generate PDF reports summarizing the session's analysis.
89
- ├── ui_helpers.py # Helper functions for the UI, including metadata display and window/level sliders.
 
90
  ├── requirements.txt # List of Python dependencies.
91
  └── README.md # Project documentation.
92
- app.py:
93
- Initializes the Streamlit interface, processes uploads, integrates all helper modules, and controls the overall workflow.
94
 
95
- dicom_utils.py:
96
- Contains functions for DICOM file parsing, metadata extraction, image conversion, and window/level handling.
97
 
98
- hf_models.py:
99
- Handles querying external VQA models (e.g., from Hugging Face) as a fallback for multimodal analysis.
100
 
101
- llm_interactions.py:
102
- Provides functions that interact with language models (like Gemini) to generate initial analyses, answer questions, run disease-specific evaluations, and estimate AI confidence.
103
 
104
- report_utils.py:
105
- Generates PDF reports summarizing the analysis session, including embedded images and formatted text.
106
 
107
- ui_helpers.py:
108
- Contains UI-related helper functions such as displaying DICOM metadata and creating interactive window/level sliders.
109
 
110
- Installation
111
- Clone the Repository:
112
 
 
 
 
 
 
 
113
  bash
114
- Copy
115
  git clone https://github.com/yourusername/radvision-ai-advanced.git
116
  cd radvision-ai-advanced
117
- Create a Virtual Environment (Optional but Recommended):
118
-
119
  bash
120
- Copy
121
  python -m venv venv
122
- source venv/bin/activate # On Windows use: venv\Scripts\activate
123
- Install Dependencies:
124
-
125
  bash
126
- Copy
127
  pip install -r requirements.txt
128
- Note: Ensure you have the required libraries such as Streamlit, Pillow, pydicom, fpdf2, and requests installed.
129
 
130
  Configuration
131
- Before running the application, configure the following environment variables or add them to a secrets.toml file for deployment:
132
 
133
- HF_API_TOKEN:
134
- Your Hugging Face API token for accessing VQA models.
135
 
136
- GEMINI_API_KEY:
137
- API key for the Gemini language model service.
138
 
139
- GEMINI_MODEL_OVERRIDE (Optional):
140
- To override the default Gemini model name (e.g., "gemini-2.5-pro-exp-03-25").
141
 
142
- For local testing with Streamlit, you can add these variables to a .env file or configure them in your terminal session.
143
 
144
  Running the Application
145
  To start the application locally, run:
146
 
147
  bash
148
- Copy
149
  streamlit run app.py
150
- The app will open in your default browser. You can then upload images, adjust DICOM window/level settings, run various AI analyses, and generate PDF reports.
151
 
152
  Usage Guide
153
- Upload an Image:
154
  Use the sidebar to upload a JPG, PNG, or DICOM file.
155
 
156
- Adjust DICOM Settings:
157
- If a DICOM image is detected, adjust the window center and width using the sliders.
 
 
 
158
 
159
- Run AI Analysis:
160
- Click the appropriate action buttons (e.g., "Run Initial Analysis", "Ask AI", "Run Condition Analysis") in the sidebar. You can also draw on the image to define a region of interest (ROI).
161
 
162
- View Results:
163
- Analysis results, Q&A responses, disease-specific insights, and confidence estimations will appear in the two-column layout.
164
 
165
- Generate a Report:
166
- Use the "Generate PDF Data" button to create a downloadable report summarizing your session.
 
 
 
 
 
167
 
168
  Contributing
169
- Contributions are welcome! Feel free to submit pull requests or open issues with suggestions, bug reports, or feature requests. Please adhere to standard coding practices and document your changes accordingly.
170
 
171
  License
172
  This project is open source and available under the MIT License.
173
 
174
- Disclaimer: This tool is intended for research and informational purposes only. Always consult a qualified healthcare professional for clinical interpretations and decisions.
175
-
176
 
177
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
11
  license: mit
12
  ---
13
  RadVision AI Advanced
14
+ RadVision AI Advanced is a cuttingedge, Streamlitbased medical imaging analysis application designed to assist clinicians and researchers with rapid, AIpowered interpretation of both DICOM and standard image formats. The tool integrates advanced image processing, region-ofinterest (ROI) selection, and multiple AI services—including language models for analysis, Q&A, and diseasespecific evaluations—to generate detailed reports and insights on medical images.
15
 
16
  Table of Contents
17
  Overview
 
32
 
33
  License
34
 
35
+ Configuration Reference
36
+
37
  Overview
38
+ RadVision AI Advanced leverages stateoftheart AI models to process and analyze medical images. The application supports both DICOM files and common image formats (JPG, PNG) and provides a user‑friendly, interactive interface. Key capabilities include:
39
 
40
+ Multi‑Format Image Processing: Automatic detection and handling of DICOM images as well as standard image formats.
41
 
42
+ ROI Selection: Users can draw regions of interest on images using an integrated drawable canvas.
43
 
44
+ MultiModal AI Analysis: Provides initial analyses, interactive Q&A sessions, diseasespecific evaluations, and confidence estimations.
45
 
46
+ PDF Report Generation: Summarizes analysis outputs in a downloadable PDF report.
47
 
48
+ Advanced Translation Functionality: Uses the deep‑translator library with a Google Translate backend to detect and translate analysis text into multiple languages, preserving the original formatting (bullet points, numbering, spacing).
49
 
50
+ Note: This application is intended for research and educational use only. Always verify results with clinical experts.
 
51
 
52
+ Features
53
+ Image Processing
54
+ DICOM Support: Parse DICOM files and extract metadata.
55
 
56
+ Window/Level Adjustment: Interactive sliders to optimize image visualization.
57
 
58
+ Standard Image Processing: Utilizes the Python Imaging Library (PIL) for common image formats.
59
 
60
+ AI Integration
61
+ Initial Analysis: Automated interpretation of the uploaded image.
62
 
63
+ Q&A Interface: Enables users to ask questions about the image with region-of‑interest support.
64
 
65
+ Disease‑Specific Evaluation: Focused analysis for conditions such as pneumonia, tuberculosis, etc.
66
 
67
+ Confidence Estimation: Provides an AI‑generated confidence score for the analysis.
68
 
69
+ Fallback Mechanisms: Uses external models (e.g., Hugging Face VQA APIs) when primary methods fail.
70
 
71
+ Translation & Language Detection
72
+ Translation Module: Implements translation using the deep‑translator library (Google Translate backend) with robust dependency checks and workarounds for known issues.
73
 
74
+ Language Detection: Detects the language of provided text snippets before translation.
75
 
76
+ Formatting Preservation: Uses a few‑shot prompt with examples to ensure bullet points, numbering, and spacing are preserved in the translation.
77
 
78
+ Reporting
79
+ PDF Report Generation: Generates downloadable PDF reports that include embedded images, session IDs, and formatted text summaries.
80
 
81
+ User Interface
82
+ Streamlit‑Based Layout: Clean two‑column design.
83
 
84
+ Left Panel: Image viewer with ROI selection and DICOM metadata.
85
 
86
+ Right Panel: Analysis results, Q&A history, disease evaluation, confidence estimation, and translation features.
87
 
88
  File Structure
89
+ pgsql
90
+ Copy code
91
  ├── app.py # Main Streamlit application entry point.
92
+ ├── dicom_utils.py # DICOM parsing, metadata extraction, and image conversion functions.
93
+ ├── hf_models.py # Integration with external VQA models (e.g., Hugging Face) as a fallback.
94
+ ├── llm_interactions.py # Functions for interfacing with language models for analysis and Q&A.
95
+ ├── report_utils.py # Functions to generate PDF reports for analysis sessions.
96
+ ├── ui_helpers.py # Helper functions for UI elements (e.g., metadata display, window/level sliders).
97
+ ├── translation_models.py # Translation and language detection using deep‑translator (Google Translate backend).
98
  ├── requirements.txt # List of Python dependencies.
99
  └── README.md # Project documentation.
100
+ app.py: Initializes the Streamlit interface, processes image uploads, integrates all modules, and controls the overall workflow.
 
101
 
102
+ dicom_utils.py: Handles DICOM file parsing, metadata extraction, image conversion, and window/level adjustments.
 
103
 
104
+ hf_models.py: Provides integration with external VQA models for fallback in multimodal analysis.
 
105
 
106
+ llm_interactions.py: Contains functions for communicating with large language models for initial analysis, Q&A, and confidence scoring.
 
107
 
108
+ report_utils.py: Creates PDF reports summarizing the analysis session.
 
109
 
110
+ ui_helpers.py: Contains functions for UI enhancements like metadata display and interactive sliders.
 
111
 
112
+ translation_models.py: Implements translation and language detection using the deep‑translator library.
 
113
 
114
+ Dependency Handling: Attempts to import deep‑translator and gracefully degrades translation features if unavailable.
115
+
116
+ Workarounds: Applies a workaround for known issues with certain exceptions.
117
+
118
+ Installation
119
+ 1. Clone the Repository
120
  bash
121
+ Copy code
122
  git clone https://github.com/yourusername/radvision-ai-advanced.git
123
  cd radvision-ai-advanced
124
+ 2. Create a Virtual Environment (Optional but Recommended)
 
125
  bash
126
+ Copy code
127
  python -m venv venv
128
+ source venv/bin/activate # On Windows: venv\Scripts\activate
129
+ 3. Install Dependencies
 
130
  bash
131
+ Copy code
132
  pip install -r requirements.txt
133
+ Ensure you have the required libraries such as Streamlit, Pillow, pydicom, deep-translator, fpdf2, and transformers installed.
134
 
135
  Configuration
136
+ Before running the application, configure the following environment variables or add them to a secrets.toml file:
137
 
138
+ HF_API_TOKEN: Your Hugging Face API token for VQA fallback.
 
139
 
140
+ GEMINI_API_KEY: API key for the Gemini language model service.
 
141
 
142
+ GEMINI_MODEL_OVERRIDE (Optional): Override for the default Gemini model name (e.g., "gemini-2.5-pro-exp-03-25").
 
143
 
144
+ For local testing, these variables can be added to a .env file or set in your terminal session.
145
 
146
  Running the Application
147
  To start the application locally, run:
148
 
149
  bash
150
+ Copy code
151
  streamlit run app.py
152
+ The app will open in your default browser. From there, you can upload images, adjust DICOM settings, perform AI analysis, access translation features, and generate PDF reports.
153
 
154
  Usage Guide
155
+ Upload an Image
156
  Use the sidebar to upload a JPG, PNG, or DICOM file.
157
 
158
+ Adjust DICOM Settings
159
+ For DICOM images, use interactive window/level sliders to optimize visualization.
160
+
161
+ Run AI Analysis
162
+ Click the action buttons (e.g., "Run Initial Analysis", "Ask AI", "Run Condition Analysis") in the sidebar. Optionally, draw an ROI on the image.
163
 
164
+ Translation Functionality
165
+ In the Translation tab, select the text to translate (e.g., Initial Analysis).
166
 
167
+ Choose "Auto‑Detect" for the source language (or select a language manually) and choose a target language.
 
168
 
169
+ The system uses deep‑translator to detect the source language and then translates the text. The few‑shot prompt provided in the app helps preserve formatting such as bullet points and numbering.
170
+
171
+ View Analysis Results
172
+ The right panel displays analysis results—including initial analysis, Q&A history, condition evaluation, confidence scores, and translations—in a clean, tabbed layout.
173
+
174
+ Generate a Report
175
+ Use the "Generate PDF Data" button to create a downloadable PDF report summarizing your session.
176
 
177
  Contributing
178
+ Contributions are welcome! Please submit pull requests or open issues for bug fixes, improvements, or new features. Follow standard coding practices and document your changes.
179
 
180
  License
181
  This project is open source and available under the MIT License.
182
 
183
+ Configuration Reference
184
+ For advanced configuration options for Hugging Face Spaces and similar deployment scenarios, please refer to the Hugging Face Spaces configuration reference.
185
 
186
+ Disclaimer
187
+ This tool is intended for research and informational purposes only. The AI outputs should be verified by clinical experts, and it is not intended for clinical decision-making without professional validation.