Sathwikchowdary commited on
Commit
578a444
·
verified ·
1 Parent(s): 0683b2a

Create 2Decision-Tree.py

Browse files
Files changed (1) hide show
  1. pages/2Decision-Tree.py +141 -0
pages/2Decision-Tree.py ADDED
@@ -0,0 +1,141 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+
3
+ # Set page configuration
4
+ st.set_page_config(page_title="Decision Tree Theory", layout="wide")
5
+
6
+ # Custom CSS for styling
7
+ st.markdown("""
8
+ <style>
9
+ .stApp {
10
+ background: linear-gradient(135deg, #1e3c72, #2a5298);
11
+ }
12
+ h1, h2 {
13
+ color: #fdfdfd;
14
+ }
15
+ p, li {
16
+ font-family: 'Arial', sans-serif;
17
+ font-size: 18px;
18
+ color: #f0f0f0;
19
+ line-height: 1.6;
20
+ }
21
+ </style>
22
+ """, unsafe_allow_html=True)
23
+
24
+ # Title
25
+ st.markdown("<h1>Decision Tree</h1>", unsafe_allow_html=True)
26
+
27
+ # Introduction
28
+ st.markdown("""
29
+ A **Decision Tree** is a supervised learning method used for both classification and regression. It models decisions in a tree structure, where:
30
+ - The **Root Node** represents the full dataset.
31
+ - **Internal Nodes** evaluate features to split the data.
32
+ - **Leaf Nodes** give the output label or value.
33
+
34
+ It's like asking a series of "yes or no" questions to reach a final decision.
35
+ """, unsafe_allow_html=True)
36
+
37
+ # Entropy
38
+ st.markdown("<h2>Entropy: Quantifying Disorder</h2>", unsafe_allow_html=True)
39
+ st.markdown("""
40
+ **Entropy** helps measure randomness or impurity in data.
41
+
42
+ The formula for entropy is:
43
+ """)
44
+ st.image("entropy-formula-2.jpg", width=300)
45
+ st.markdown("""
46
+ If you have two classes (Yes/No) each with a 50% chance:
47
+
48
+ $$ H(Y) = - (0.5 \cdot \log_2(0.5) + 0.5 \cdot \log_2(0.5)) = 1 $$
49
+
50
+ This means maximum uncertainty.
51
+ """, unsafe_allow_html=True)
52
+
53
+ # Gini Impurity
54
+ st.markdown("<h2>Gini Impurity: Measuring Purity</h2>", unsafe_allow_html=True)
55
+ st.markdown("""
56
+ **Gini Impurity** is another metric that measures how often a randomly chosen element would be incorrectly classified.
57
+
58
+ The formula is:
59
+ """)
60
+ st.image("gini.png", width=300)
61
+ st.markdown("""
62
+ With 50% Yes and 50% No:
63
+
64
+ $$ Gini(Y) = 1 - (0.5^2 + 0.5^2) = 0.5 $$
65
+
66
+ A lower Gini means more purity.
67
+ """, unsafe_allow_html=True)
68
+
69
+ # Construction of Decision Tree
70
+ st.markdown("<h2>How a Decision Tree is Built</h2>", unsafe_allow_html=True)
71
+ st.markdown("""
72
+ The tree grows top-down, choosing the best feature at each step based on how well it splits the data. The process ends when:
73
+ - All samples in a node are of one class.
74
+ - A stopping condition like max depth is reached.
75
+ """, unsafe_allow_html=True)
76
+
77
+ # Iris Dataset
78
+ st.markdown("<h2>Iris Dataset Example</h2>", unsafe_allow_html=True)
79
+ st.markdown("""
80
+ This tree is trained on the famous **Iris dataset**, where features like petal length help classify the flower species.
81
+ """, unsafe_allow_html=True)
82
+ st.image("dt1 (1).jpg", caption="Decision Tree for Iris Dataset", use_container_width=True)
83
+
84
+ # Training & Testing - Classification
85
+ st.markdown("<h2>Training & Testing: Classification</h2>", unsafe_allow_html=True)
86
+ st.markdown("""
87
+ - During **training**, the model learns rules from labeled data using Gini or Entropy.
88
+ - In the **testing phase**, new samples are passed through the tree to make predictions.
89
+
90
+ Example: Predict Iris species based on its features.
91
+ """, unsafe_allow_html=True)
92
+
93
+ # Training & Testing - Regression
94
+ st.markdown("<h2>Training & Testing: Regression</h2>", unsafe_allow_html=True)
95
+ st.markdown("""
96
+ - For regression, the tree splits data to reduce **Mean Squared Error (MSE)**.
97
+ - Each leaf node predicts a continuous value (e.g., house price).
98
+
99
+ Example: Predicting house prices based on area, number of rooms, etc.
100
+ """, unsafe_allow_html=True)
101
+
102
+ # Pre-Pruning
103
+ st.markdown("<h2>Controlling Overfitting: Pre-Pruning</h2>", unsafe_allow_html=True)
104
+ st.markdown("""
105
+ **Pre-pruning** stops the tree from growing too large.
106
+
107
+ Techniques:
108
+ - **Max Depth**: Limits how deep the tree can go.
109
+ - **Min Samples Split**: Minimum data points needed to split a node.
110
+ - **Min Samples Leaf**: Minimum data points required in a leaf.
111
+ - **Max Features**: Restricts number of features used per split.
112
+ """, unsafe_allow_html=True)
113
+
114
+ # Post-Pruning
115
+ st.markdown("<h2>Post-Pruning: Simplifying After Training</h2>", unsafe_allow_html=True)
116
+ st.markdown("""
117
+ **Post-pruning** trims the tree **after** full training to reduce complexity.
118
+
119
+ Methods:
120
+ - **Cost Complexity Pruning**
121
+ - **Validation Set Pruning**
122
+ """, unsafe_allow_html=True)
123
+
124
+ # Feature Selection
125
+ st.markdown("<h2>Feature Selection with Trees</h2>", unsafe_allow_html=True)
126
+ st.markdown("""
127
+ Decision Trees can rank features by how much they reduce impurity at each split.
128
+
129
+ Here's the formula used:
130
+ """)
131
+ st.image("feature.png", width=500)
132
+ st.markdown("""
133
+ The higher the score, the more important the feature.
134
+ """, unsafe_allow_html=True)
135
+
136
+ # Implementation Link
137
+ st.markdown("<h2>Try It Yourself</h2>", unsafe_allow_html=True)
138
+ st.markdown(
139
+ "<a href='https://colab.research.google.com/drive/1SqZ5I5h7ivS6SJDwlOZQ-V4IAOg90RE7?usp=sharing' target='_blank' style='font-size: 16px; color: #add8e6;'>Open Jupyter Notebook</a>",
140
+ unsafe_allow_html=True
141
+ )