Create 2Decision-Tree.py
Browse files- pages/2Decision-Tree.py +141 -0
pages/2Decision-Tree.py
ADDED
@@ -0,0 +1,141 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import streamlit as st
|
2 |
+
|
3 |
+
# Set page configuration
|
4 |
+
st.set_page_config(page_title="Decision Tree Theory", layout="wide")
|
5 |
+
|
6 |
+
# Custom CSS for styling
|
7 |
+
st.markdown("""
|
8 |
+
<style>
|
9 |
+
.stApp {
|
10 |
+
background: linear-gradient(135deg, #1e3c72, #2a5298);
|
11 |
+
}
|
12 |
+
h1, h2 {
|
13 |
+
color: #fdfdfd;
|
14 |
+
}
|
15 |
+
p, li {
|
16 |
+
font-family: 'Arial', sans-serif;
|
17 |
+
font-size: 18px;
|
18 |
+
color: #f0f0f0;
|
19 |
+
line-height: 1.6;
|
20 |
+
}
|
21 |
+
</style>
|
22 |
+
""", unsafe_allow_html=True)
|
23 |
+
|
24 |
+
# Title
|
25 |
+
st.markdown("<h1>Decision Tree</h1>", unsafe_allow_html=True)
|
26 |
+
|
27 |
+
# Introduction
|
28 |
+
st.markdown("""
|
29 |
+
A **Decision Tree** is a supervised learning method used for both classification and regression. It models decisions in a tree structure, where:
|
30 |
+
- The **Root Node** represents the full dataset.
|
31 |
+
- **Internal Nodes** evaluate features to split the data.
|
32 |
+
- **Leaf Nodes** give the output label or value.
|
33 |
+
|
34 |
+
It's like asking a series of "yes or no" questions to reach a final decision.
|
35 |
+
""", unsafe_allow_html=True)
|
36 |
+
|
37 |
+
# Entropy
|
38 |
+
st.markdown("<h2>Entropy: Quantifying Disorder</h2>", unsafe_allow_html=True)
|
39 |
+
st.markdown("""
|
40 |
+
**Entropy** helps measure randomness or impurity in data.
|
41 |
+
|
42 |
+
The formula for entropy is:
|
43 |
+
""")
|
44 |
+
st.image("entropy-formula-2.jpg", width=300)
|
45 |
+
st.markdown("""
|
46 |
+
If you have two classes (Yes/No) each with a 50% chance:
|
47 |
+
|
48 |
+
$$ H(Y) = - (0.5 \cdot \log_2(0.5) + 0.5 \cdot \log_2(0.5)) = 1 $$
|
49 |
+
|
50 |
+
This means maximum uncertainty.
|
51 |
+
""", unsafe_allow_html=True)
|
52 |
+
|
53 |
+
# Gini Impurity
|
54 |
+
st.markdown("<h2>Gini Impurity: Measuring Purity</h2>", unsafe_allow_html=True)
|
55 |
+
st.markdown("""
|
56 |
+
**Gini Impurity** is another metric that measures how often a randomly chosen element would be incorrectly classified.
|
57 |
+
|
58 |
+
The formula is:
|
59 |
+
""")
|
60 |
+
st.image("gini.png", width=300)
|
61 |
+
st.markdown("""
|
62 |
+
With 50% Yes and 50% No:
|
63 |
+
|
64 |
+
$$ Gini(Y) = 1 - (0.5^2 + 0.5^2) = 0.5 $$
|
65 |
+
|
66 |
+
A lower Gini means more purity.
|
67 |
+
""", unsafe_allow_html=True)
|
68 |
+
|
69 |
+
# Construction of Decision Tree
|
70 |
+
st.markdown("<h2>How a Decision Tree is Built</h2>", unsafe_allow_html=True)
|
71 |
+
st.markdown("""
|
72 |
+
The tree grows top-down, choosing the best feature at each step based on how well it splits the data. The process ends when:
|
73 |
+
- All samples in a node are of one class.
|
74 |
+
- A stopping condition like max depth is reached.
|
75 |
+
""", unsafe_allow_html=True)
|
76 |
+
|
77 |
+
# Iris Dataset
|
78 |
+
st.markdown("<h2>Iris Dataset Example</h2>", unsafe_allow_html=True)
|
79 |
+
st.markdown("""
|
80 |
+
This tree is trained on the famous **Iris dataset**, where features like petal length help classify the flower species.
|
81 |
+
""", unsafe_allow_html=True)
|
82 |
+
st.image("dt1 (1).jpg", caption="Decision Tree for Iris Dataset", use_container_width=True)
|
83 |
+
|
84 |
+
# Training & Testing - Classification
|
85 |
+
st.markdown("<h2>Training & Testing: Classification</h2>", unsafe_allow_html=True)
|
86 |
+
st.markdown("""
|
87 |
+
- During **training**, the model learns rules from labeled data using Gini or Entropy.
|
88 |
+
- In the **testing phase**, new samples are passed through the tree to make predictions.
|
89 |
+
|
90 |
+
Example: Predict Iris species based on its features.
|
91 |
+
""", unsafe_allow_html=True)
|
92 |
+
|
93 |
+
# Training & Testing - Regression
|
94 |
+
st.markdown("<h2>Training & Testing: Regression</h2>", unsafe_allow_html=True)
|
95 |
+
st.markdown("""
|
96 |
+
- For regression, the tree splits data to reduce **Mean Squared Error (MSE)**.
|
97 |
+
- Each leaf node predicts a continuous value (e.g., house price).
|
98 |
+
|
99 |
+
Example: Predicting house prices based on area, number of rooms, etc.
|
100 |
+
""", unsafe_allow_html=True)
|
101 |
+
|
102 |
+
# Pre-Pruning
|
103 |
+
st.markdown("<h2>Controlling Overfitting: Pre-Pruning</h2>", unsafe_allow_html=True)
|
104 |
+
st.markdown("""
|
105 |
+
**Pre-pruning** stops the tree from growing too large.
|
106 |
+
|
107 |
+
Techniques:
|
108 |
+
- **Max Depth**: Limits how deep the tree can go.
|
109 |
+
- **Min Samples Split**: Minimum data points needed to split a node.
|
110 |
+
- **Min Samples Leaf**: Minimum data points required in a leaf.
|
111 |
+
- **Max Features**: Restricts number of features used per split.
|
112 |
+
""", unsafe_allow_html=True)
|
113 |
+
|
114 |
+
# Post-Pruning
|
115 |
+
st.markdown("<h2>Post-Pruning: Simplifying After Training</h2>", unsafe_allow_html=True)
|
116 |
+
st.markdown("""
|
117 |
+
**Post-pruning** trims the tree **after** full training to reduce complexity.
|
118 |
+
|
119 |
+
Methods:
|
120 |
+
- **Cost Complexity Pruning**
|
121 |
+
- **Validation Set Pruning**
|
122 |
+
""", unsafe_allow_html=True)
|
123 |
+
|
124 |
+
# Feature Selection
|
125 |
+
st.markdown("<h2>Feature Selection with Trees</h2>", unsafe_allow_html=True)
|
126 |
+
st.markdown("""
|
127 |
+
Decision Trees can rank features by how much they reduce impurity at each split.
|
128 |
+
|
129 |
+
Here's the formula used:
|
130 |
+
""")
|
131 |
+
st.image("feature.png", width=500)
|
132 |
+
st.markdown("""
|
133 |
+
The higher the score, the more important the feature.
|
134 |
+
""", unsafe_allow_html=True)
|
135 |
+
|
136 |
+
# Implementation Link
|
137 |
+
st.markdown("<h2>Try It Yourself</h2>", unsafe_allow_html=True)
|
138 |
+
st.markdown(
|
139 |
+
"<a href='https://colab.research.google.com/drive/1SqZ5I5h7ivS6SJDwlOZQ-V4IAOg90RE7?usp=sharing' target='_blank' style='font-size: 16px; color: #add8e6;'>Open Jupyter Notebook</a>",
|
140 |
+
unsafe_allow_html=True
|
141 |
+
)
|