Spaces:

Ci-Dave
/

DR_Classification

Sleeping

App Files Files Community

3v324v23 commited on May 6

Commit

1b4d479

1 Parent(s): 466431f

Added Homepage

Browse files

Files changed (2) hide show

app.py → HomePage.py +0 -0
pages/Dataset.py +111 -41

app.py → HomePage.py RENAMED Viewed

File without changes

pages/Dataset.py CHANGED Viewed

@@ -2,61 +2,131 @@ import streamlit as st
 import pandas as pd
 import os
 from PIL import Image
 st.set_page_config(layout="wide")
-st.title("📂 Dataset Information")
-# Introduction
-st.markdown("""
-### 🧾 Dataset Overview
-## Dataset Descrption
-### DDR dataset contains 13,673 fundus images from 147 hospitals, covering 23 provinces in China. The images are classified into 5 classes according to DR severity: none, mild, moderate, severe, and proliferative DR. There is a sixth category which indicates the images with poor quality. The dataset presented here does not include the images with poor quality (sixth category) and all images have been preprocessed to delete the black background. https://www.kaggle.com/datasets/mariaherrerot/ddrdataset
-- **No_DR**
-- **Mild**
-- **Moderate**
-- **Severe**
-- **Proliferative_DR**
-""")
-# Dataset preparation explanation
-st.markdown("""
-### 🧪 Data Preparation & Splitting
-The original dataset was preprocessed and resized to **224x224 pixels**. It was then split into three sets:
-- **Training Set**: Used to train the model.
-- **Validation Set** *(optional)*: Used to fine-tune hyperparameters.
-- **Testing Set**: Used for final model evaluation.
-We used an 80-20 stratified split:
-- **80%** of the data was used for training.
-- **20%** was reserved for testing, ensuring each class was proportionally represented.
-A CSV file (`test_labels.csv`) was created for the test set, containing the filenames and their corresponding class labels.
-""")
-# Visualizing the test dataset
-st.markdown("### 📸 Sample Images from Test Dataset")
-csv_path = "D:/DR_Classification/splits/test_labels.csv"
-img_dir = "D:/DR_Classification/splits/test"
-try:
-    df = pd.read_csv(csv_path)
-    class_names = df.iloc[:, 1].unique()
-    for class_name in class_names:
-        st.subheader(f"🔍 Class: {class_name}")
-        class_samples = df[df.iloc[:, 1] == class_name].head(3)
-        cols = st.columns(len(class_samples))
-        for i, row in enumerate(class_samples.itertuples()):
-            img_path = os.path.join(img_dir, row[1])
             if os.path.exists(img_path):
-                image = Image.open(img_path).convert('RGB')
-                cols[i].image(image, caption=row[1], use_column_width=True)
-except Exception as e:
-    st.error(f"Error loading dataset: {e}")

 import pandas as pd
 import os
 from PIL import Image
+import matplotlib.pyplot as plt
+import seaborn as sns
 st.set_page_config(layout="wide")
+st.title("🩺 Diabetic Retinopathy Project")
+# Tabs
+tab1, tab2, tab3 = st.tabs(["📂 Dataset Info", "📊 Training Visualization", "🤖 Algorithm Used"])
+# =============================
+# Tab 1: Dataset Information
+# =============================
+with tab1:
+    st.markdown("""
+    ### 🧾 Dataset Overview
+    **Dataset Description:**
+    The DDR dataset contains **13,673 fundus images** from **147 hospitals** across **23 provinces in China**. The images are labeled into 5 classes based on DR severity:
+    - **No_DR**
+    - **Mild**
+    - **Moderate**
+    - **Severe**
+    - **Proliferative_DR**
+    Poor-quality images were removed, and black backgrounds were deleted.
+    [📎 Dataset source](https://www.kaggle.com/datasets/mariaherrerot/ddrdataset)
+    ### 🧪 Data Preparation & Splitting
+    - All images resized to **224x224**
+    - **80% Training**, **20% Testing** (stratified by class)
+    """)
+# =============================
+# Tab 2: Training Visualization
+# =============================
+with tab2:
+    st.markdown("### 📊 Training Data Class Distribution")
+    # CSV path and image folder path (adjust as needed)
+    CSV_PATH = r"D:\\DR_Classification\\dataset\\DR_grading.csv"
+    IMG_FOLDER = r"D:\\DR_Classification\\dataset\\images"  # Folder where all images are stored
+    # Load CSV
+    df = pd.read_csv(CSV_PATH)
+    # Map the 'diagnosis' column to 'label' if it's numeric (e.g., 0 to 4)
+    label_map = {
+        0: "No_DR",
+        1: "Mild",
+        2: "Moderate",
+        3: "Severe",
+        4: "Proliferative_DR"
+    }
+    df['label'] = df['diagnosis'].map(label_map)
+    # --- Metric 1: Class Distribution ---
+    st.subheader("1️⃣ Class Distribution")
+    class_counts = df['label'].value_counts().reset_index()
+    class_counts.columns = ['Class', 'Count']
+    fig1, ax1 = plt.subplots()
+    sns.barplot(data=class_counts, x='Class', y='Count', palette='rocket', ax=ax1)
+    ax1.set_title("Class Distribution")
+    st.pyplot(fig1)
+    # --- Metric 2: Sample Images Per Class ---
+    st.subheader("2️⃣ Sample Images Per Class")
+    cols = st.columns(len(class_counts))
+    for i, label in enumerate(class_counts['Class']):
+        sample_row = df[df['label'] == label].iloc[0]  # Get first image of this class
+        img_path = os.path.join(IMG_FOLDER, sample_row['id_code'])  # Assuming image filenames are id_code.png
+        if os.path.exists(img_path):
+            image = Image.open(img_path)
+            cols[i].image(image, caption=label, use_container_width=True)
+        else:
+            cols[i].write(f"Image not found: {sample_row['id_code']}")
+    # --- Metric 3: Image Size Distribution ---
+    st.subheader("3️⃣ Image Size Distribution")
+    image_sizes = []
+    # Check a few images per class for speed
+    for label in class_counts['Class']:
+        sample_paths = df[df['label'] == label]['id_code'][:5]  # 5 images per class
+        for img_code in sample_paths:
+            img_path = os.path.join(IMG_FOLDER, str(img_code))  # Assuming image filenames are id_code.png
             if os.path.exists(img_path):
+                try:
+                    with Image.open(img_path) as img:
+                        image_sizes.append(img.size)
+                except Exception as e:
+                    st.warning(f"Error loading image {img_code}: {e}")
+                    pass
+    if image_sizes:
+        widths, heights = zip(*image_sizes)
+        fig2, ax2 = plt.subplots()
+        sns.histplot(widths, kde=True, label="Width", color="blue")
+        sns.histplot(heights, kde=True, label="Height", color="green")
+        ax2.legend()
+        ax2.set_title("Image Size Distribution")
+        st.pyplot(fig2)
+    else:
+        st.info("No image size data available. Check your paths.")
+# =============================
+# Tab 3: Algorithm Used
+# =============================
+with tab3:
+    st.markdown("""
+    ### 🤖 Model and Algorithm
+    We used **Transfer Learning** with **ResNet50** for DR classification.
+    #### 🏗️ Model Details:
+    - Input Image Size: **224x224**
+    - Pretrained on **ImageNet**
+    - Optimizer: **Adam**
+    - Loss Function: **Categorical Crossentropy**
+    - Evaluation Metrics: **Accuracy**, **Precision**, **Recall**
+    This architecture is ideal for medical image analysis due to its deep layers and robustness to overfitting.
+    """)