Spaces:
Sleeping
Sleeping
added the dataset table
Browse files- pages/Dataset.py +32 -40
- training/training.ipynb +1 -1
pages/Dataset.py
CHANGED
@@ -4,6 +4,7 @@ import os
|
|
4 |
from PIL import Image
|
5 |
import matplotlib.pyplot as plt
|
6 |
import seaborn as sns
|
|
|
7 |
|
8 |
st.set_page_config(layout="wide")
|
9 |
st.title("🩺 Diabetic Retinopathy Project")
|
@@ -21,16 +22,15 @@ with tab1:
|
|
21 |
**Dataset Description:**
|
22 |
|
23 |
The DDR dataset contains **13,673 fundus images** from **147 hospitals** across **23 provinces in China**. The images are labeled into 5 classes based on DR severity:
|
24 |
-
|
25 |
- **No_DR**
|
26 |
- **Mild**
|
27 |
- **Moderate**
|
28 |
- **Severe**
|
29 |
- **Proliferative_DR**
|
30 |
|
31 |
-
Poor-quality images were removed, and black backgrounds were deleted.
|
32 |
[📎 Dataset source](https://www.kaggle.com/datasets/mariaherrerot/ddrdataset)
|
33 |
-
|
34 |
### 🧪 Data Preparation & Splitting
|
35 |
|
36 |
- All images resized to **224x224**
|
@@ -60,7 +60,11 @@ with tab2:
|
|
60 |
}
|
61 |
df['label'] = df['diagnosis'].map(label_map)
|
62 |
|
63 |
-
# --- Metric 1:
|
|
|
|
|
|
|
|
|
64 |
st.subheader("1️⃣ Class Distribution")
|
65 |
class_counts = df['label'].value_counts().reset_index()
|
66 |
class_counts.columns = ['Class', 'Count']
|
@@ -70,7 +74,7 @@ with tab2:
|
|
70 |
ax1.set_title("Class Distribution")
|
71 |
st.pyplot(fig1)
|
72 |
|
73 |
-
# --- Metric
|
74 |
st.subheader("2️⃣ Sample Images Per Class")
|
75 |
|
76 |
cols = st.columns(len(class_counts))
|
@@ -82,36 +86,6 @@ with tab2:
|
|
82 |
cols[i].image(image, caption=label, use_container_width=True)
|
83 |
else:
|
84 |
cols[i].write(f"Image not found: {sample_row['id_code']}")
|
85 |
-
|
86 |
-
# --- Metric 3: Image Size Distribution ---
|
87 |
-
st.subheader("3️⃣ Image Size Distribution")
|
88 |
-
|
89 |
-
image_sizes = []
|
90 |
-
|
91 |
-
# Check a few images per class for speed
|
92 |
-
for label in class_counts['Class']:
|
93 |
-
sample_paths = df[df['label'] == label]['id_code'][:5] # 5 images per class
|
94 |
-
for img_code in sample_paths:
|
95 |
-
img_path = os.path.join(IMG_FOLDER, str(img_code)) # Assuming image filenames are id_code.png
|
96 |
-
if os.path.exists(img_path):
|
97 |
-
try:
|
98 |
-
with Image.open(img_path) as img:
|
99 |
-
image_sizes.append(img.size)
|
100 |
-
except Exception as e:
|
101 |
-
st.warning(f"Error loading image {img_code}: {e}")
|
102 |
-
pass
|
103 |
-
|
104 |
-
if image_sizes:
|
105 |
-
widths, heights = zip(*image_sizes)
|
106 |
-
fig2, ax2 = plt.subplots()
|
107 |
-
sns.histplot(widths, kde=True, label="Width", color="blue")
|
108 |
-
sns.histplot(heights, kde=True, label="Height", color="green")
|
109 |
-
ax2.legend()
|
110 |
-
ax2.set_title("Image Size Distribution")
|
111 |
-
st.pyplot(fig2)
|
112 |
-
else:
|
113 |
-
st.info("No image size data available. Check your paths.")
|
114 |
-
|
115 |
# =============================
|
116 |
# Tab 3: Algorithm Used
|
117 |
# =============================
|
@@ -119,14 +93,32 @@ with tab3:
|
|
119 |
st.markdown("""
|
120 |
### 🤖 Model and Algorithm
|
121 |
|
122 |
-
We used **Transfer Learning** with **
|
123 |
|
124 |
#### 🏗️ Model Details:
|
|
|
125 |
- Input Image Size: **224x224**
|
126 |
-
-
|
127 |
-
- Optimizer: **
|
128 |
- Loss Function: **Categorical Crossentropy**
|
129 |
- Evaluation Metrics: **Accuracy**, **Precision**, **Recall**
|
130 |
|
131 |
-
|
132 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
from PIL import Image
|
5 |
import matplotlib.pyplot as plt
|
6 |
import seaborn as sns
|
7 |
+
import numpy as np
|
8 |
|
9 |
st.set_page_config(layout="wide")
|
10 |
st.title("🩺 Diabetic Retinopathy Project")
|
|
|
22 |
**Dataset Description:**
|
23 |
|
24 |
The DDR dataset contains **13,673 fundus images** from **147 hospitals** across **23 provinces in China**. The images are labeled into 5 classes based on DR severity:
|
|
|
25 |
- **No_DR**
|
26 |
- **Mild**
|
27 |
- **Moderate**
|
28 |
- **Severe**
|
29 |
- **Proliferative_DR**
|
30 |
|
31 |
+
Poor-quality images were removed, and black backgrounds were deleted. **12,521 images left**
|
32 |
[📎 Dataset source](https://www.kaggle.com/datasets/mariaherrerot/ddrdataset)
|
33 |
+
|
34 |
### 🧪 Data Preparation & Splitting
|
35 |
|
36 |
- All images resized to **224x224**
|
|
|
60 |
}
|
61 |
df['label'] = df['diagnosis'].map(label_map)
|
62 |
|
63 |
+
# --- Metric 1: Full Dataset Table ---
|
64 |
+
st.subheader("3️⃣ Full Dataset Table")
|
65 |
+
st.dataframe(df, use_container_width=True)
|
66 |
+
|
67 |
+
# --- Metric 2: Class Distribution ---
|
68 |
st.subheader("1️⃣ Class Distribution")
|
69 |
class_counts = df['label'].value_counts().reset_index()
|
70 |
class_counts.columns = ['Class', 'Count']
|
|
|
74 |
ax1.set_title("Class Distribution")
|
75 |
st.pyplot(fig1)
|
76 |
|
77 |
+
# --- Metric 3: Sample Images Per Class ---
|
78 |
st.subheader("2️⃣ Sample Images Per Class")
|
79 |
|
80 |
cols = st.columns(len(class_counts))
|
|
|
86 |
cols[i].image(image, caption=label, use_container_width=True)
|
87 |
else:
|
88 |
cols[i].write(f"Image not found: {sample_row['id_code']}")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
89 |
# =============================
|
90 |
# Tab 3: Algorithm Used
|
91 |
# =============================
|
|
|
93 |
st.markdown("""
|
94 |
### 🤖 Model and Algorithm
|
95 |
|
96 |
+
We used **Transfer Learning** with **DenseNet121** for DR classification.
|
97 |
|
98 |
#### 🏗️ Model Details:
|
99 |
+
- Model: **DenseNet121** (pretrained on **ImageNet**)
|
100 |
- Input Image Size: **224x224**
|
101 |
+
- Batch Size: **32**
|
102 |
+
- Optimizer: **AdamW** (learning rate = **1e-3**)
|
103 |
- Loss Function: **Categorical Crossentropy**
|
104 |
- Evaluation Metrics: **Accuracy**, **Precision**, **Recall**
|
105 |
|
106 |
+
#### 📊 Evaluation Results:
|
107 |
+
- **Top-1 Accuracy:** 85.0%
|
108 |
+
- **Top-2 Accuracy:** 84.9%
|
109 |
+
- **Top-3 Accuracy:** 84.6%
|
110 |
+
|
111 |
+
#### 🖥️ Training Environment:
|
112 |
+
- **Operating System:** Windows
|
113 |
+
- **Hardware:** CPU only (no GPU)
|
114 |
+
- **Epochs:** 15
|
115 |
+
- **Training Time:** ~41 minutes per epoch
|
116 |
+
|
117 |
+
Since the training was done on a CPU, it was slower compared to using a GPU.
|
118 |
+
Because of this, we only trained for 15 epochs to save time.
|
119 |
+
|
120 |
+
DenseNet121 was selected because it passes features directly to deeper layers,
|
121 |
+
which helps improve learning and reduces overfitting — especially useful in medical images like eye scans.
|
122 |
+
https://www.researchgate.net/publication/373171778_Deep_learning-enhanced_diabetic_retinopathy_image_classification
|
123 |
+
""")
|
124 |
+
|
training/training.ipynb
CHANGED
@@ -271,7 +271,7 @@
|
|
271 |
"id": "1e34f571",
|
272 |
"metadata": {},
|
273 |
"source": [
|
274 |
-
"#### For the ESRGAN if applicable"
|
275 |
]
|
276 |
},
|
277 |
{
|
|
|
271 |
"id": "1e34f571",
|
272 |
"metadata": {},
|
273 |
"source": [
|
274 |
+
"#### For the ESRGAN if applicable (Future)"
|
275 |
]
|
276 |
},
|
277 |
{
|