3v324v23 commited on
Commit
198f3b5
·
1 Parent(s): 1b4d479

added the dataset table

Browse files
Files changed (2) hide show
  1. pages/Dataset.py +32 -40
  2. training/training.ipynb +1 -1
pages/Dataset.py CHANGED
@@ -4,6 +4,7 @@ import os
4
  from PIL import Image
5
  import matplotlib.pyplot as plt
6
  import seaborn as sns
 
7
 
8
  st.set_page_config(layout="wide")
9
  st.title("🩺 Diabetic Retinopathy Project")
@@ -21,16 +22,15 @@ with tab1:
21
  **Dataset Description:**
22
 
23
  The DDR dataset contains **13,673 fundus images** from **147 hospitals** across **23 provinces in China**. The images are labeled into 5 classes based on DR severity:
24
-
25
  - **No_DR**
26
  - **Mild**
27
  - **Moderate**
28
  - **Severe**
29
  - **Proliferative_DR**
30
 
31
- Poor-quality images were removed, and black backgrounds were deleted.
32
  [📎 Dataset source](https://www.kaggle.com/datasets/mariaherrerot/ddrdataset)
33
-
34
  ### 🧪 Data Preparation & Splitting
35
 
36
  - All images resized to **224x224**
@@ -60,7 +60,11 @@ with tab2:
60
  }
61
  df['label'] = df['diagnosis'].map(label_map)
62
 
63
- # --- Metric 1: Class Distribution ---
 
 
 
 
64
  st.subheader("1️⃣ Class Distribution")
65
  class_counts = df['label'].value_counts().reset_index()
66
  class_counts.columns = ['Class', 'Count']
@@ -70,7 +74,7 @@ with tab2:
70
  ax1.set_title("Class Distribution")
71
  st.pyplot(fig1)
72
 
73
- # --- Metric 2: Sample Images Per Class ---
74
  st.subheader("2️⃣ Sample Images Per Class")
75
 
76
  cols = st.columns(len(class_counts))
@@ -82,36 +86,6 @@ with tab2:
82
  cols[i].image(image, caption=label, use_container_width=True)
83
  else:
84
  cols[i].write(f"Image not found: {sample_row['id_code']}")
85
-
86
- # --- Metric 3: Image Size Distribution ---
87
- st.subheader("3️⃣ Image Size Distribution")
88
-
89
- image_sizes = []
90
-
91
- # Check a few images per class for speed
92
- for label in class_counts['Class']:
93
- sample_paths = df[df['label'] == label]['id_code'][:5] # 5 images per class
94
- for img_code in sample_paths:
95
- img_path = os.path.join(IMG_FOLDER, str(img_code)) # Assuming image filenames are id_code.png
96
- if os.path.exists(img_path):
97
- try:
98
- with Image.open(img_path) as img:
99
- image_sizes.append(img.size)
100
- except Exception as e:
101
- st.warning(f"Error loading image {img_code}: {e}")
102
- pass
103
-
104
- if image_sizes:
105
- widths, heights = zip(*image_sizes)
106
- fig2, ax2 = plt.subplots()
107
- sns.histplot(widths, kde=True, label="Width", color="blue")
108
- sns.histplot(heights, kde=True, label="Height", color="green")
109
- ax2.legend()
110
- ax2.set_title("Image Size Distribution")
111
- st.pyplot(fig2)
112
- else:
113
- st.info("No image size data available. Check your paths.")
114
-
115
  # =============================
116
  # Tab 3: Algorithm Used
117
  # =============================
@@ -119,14 +93,32 @@ with tab3:
119
  st.markdown("""
120
  ### 🤖 Model and Algorithm
121
 
122
- We used **Transfer Learning** with **ResNet50** for DR classification.
123
 
124
  #### 🏗️ Model Details:
 
125
  - Input Image Size: **224x224**
126
- - Pretrained on **ImageNet**
127
- - Optimizer: **Adam**
128
  - Loss Function: **Categorical Crossentropy**
129
  - Evaluation Metrics: **Accuracy**, **Precision**, **Recall**
130
 
131
- This architecture is ideal for medical image analysis due to its deep layers and robustness to overfitting.
132
- """)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  from PIL import Image
5
  import matplotlib.pyplot as plt
6
  import seaborn as sns
7
+ import numpy as np
8
 
9
  st.set_page_config(layout="wide")
10
  st.title("🩺 Diabetic Retinopathy Project")
 
22
  **Dataset Description:**
23
 
24
  The DDR dataset contains **13,673 fundus images** from **147 hospitals** across **23 provinces in China**. The images are labeled into 5 classes based on DR severity:
 
25
  - **No_DR**
26
  - **Mild**
27
  - **Moderate**
28
  - **Severe**
29
  - **Proliferative_DR**
30
 
31
+ Poor-quality images were removed, and black backgrounds were deleted. **12,521 images left**
32
  [📎 Dataset source](https://www.kaggle.com/datasets/mariaherrerot/ddrdataset)
33
+
34
  ### 🧪 Data Preparation & Splitting
35
 
36
  - All images resized to **224x224**
 
60
  }
61
  df['label'] = df['diagnosis'].map(label_map)
62
 
63
+ # --- Metric 1: Full Dataset Table ---
64
+ st.subheader("3️⃣ Full Dataset Table")
65
+ st.dataframe(df, use_container_width=True)
66
+
67
+ # --- Metric 2: Class Distribution ---
68
  st.subheader("1️⃣ Class Distribution")
69
  class_counts = df['label'].value_counts().reset_index()
70
  class_counts.columns = ['Class', 'Count']
 
74
  ax1.set_title("Class Distribution")
75
  st.pyplot(fig1)
76
 
77
+ # --- Metric 3: Sample Images Per Class ---
78
  st.subheader("2️⃣ Sample Images Per Class")
79
 
80
  cols = st.columns(len(class_counts))
 
86
  cols[i].image(image, caption=label, use_container_width=True)
87
  else:
88
  cols[i].write(f"Image not found: {sample_row['id_code']}")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
89
  # =============================
90
  # Tab 3: Algorithm Used
91
  # =============================
 
93
  st.markdown("""
94
  ### 🤖 Model and Algorithm
95
 
96
+ We used **Transfer Learning** with **DenseNet121** for DR classification.
97
 
98
  #### 🏗️ Model Details:
99
+ - Model: **DenseNet121** (pretrained on **ImageNet**)
100
  - Input Image Size: **224x224**
101
+ - Batch Size: **32**
102
+ - Optimizer: **AdamW** (learning rate = **1e-3**)
103
  - Loss Function: **Categorical Crossentropy**
104
  - Evaluation Metrics: **Accuracy**, **Precision**, **Recall**
105
 
106
+ #### 📊 Evaluation Results:
107
+ - **Top-1 Accuracy:** 85.0%
108
+ - **Top-2 Accuracy:** 84.9%
109
+ - **Top-3 Accuracy:** 84.6%
110
+
111
+ #### 🖥️ Training Environment:
112
+ - **Operating System:** Windows
113
+ - **Hardware:** CPU only (no GPU)
114
+ - **Epochs:** 15
115
+ - **Training Time:** ~41 minutes per epoch
116
+
117
+ Since the training was done on a CPU, it was slower compared to using a GPU.
118
+ Because of this, we only trained for 15 epochs to save time.
119
+
120
+ DenseNet121 was selected because it passes features directly to deeper layers,
121
+ which helps improve learning and reduces overfitting — especially useful in medical images like eye scans.
122
+ https://www.researchgate.net/publication/373171778_Deep_learning-enhanced_diabetic_retinopathy_image_classification
123
+ """)
124
+
training/training.ipynb CHANGED
@@ -271,7 +271,7 @@
271
  "id": "1e34f571",
272
  "metadata": {},
273
  "source": [
274
- "#### For the ESRGAN if applicable"
275
  ]
276
  },
277
  {
 
271
  "id": "1e34f571",
272
  "metadata": {},
273
  "source": [
274
+ "#### For the ESRGAN if applicable (Future)"
275
  ]
276
  },
277
  {