File size: 4,831 Bytes
578a444
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
import streamlit as st

# Set page configuration
st.set_page_config(page_title="Decision Tree Theory", layout="wide")

# Custom CSS for styling
st.markdown("""
    <style>
        .stApp {
            background: linear-gradient(135deg, #1e3c72, #2a5298);
        }
        h1, h2 {
            color: #fdfdfd;
        }
        p, li {
            font-family: 'Arial', sans-serif;
            font-size: 18px;
            color: #f0f0f0;
            line-height: 1.6;
        }
    </style>
""", unsafe_allow_html=True)

# Title
st.markdown("<h1>Decision Tree</h1>", unsafe_allow_html=True)

# Introduction
st.markdown("""
A **Decision Tree** is a supervised learning method used for both classification and regression. It models decisions in a tree structure, where:
- The **Root Node** represents the full dataset.
- **Internal Nodes** evaluate features to split the data.
- **Leaf Nodes** give the output label or value.

It's like asking a series of "yes or no" questions to reach a final decision.
""", unsafe_allow_html=True)

# Entropy
st.markdown("<h2>Entropy: Quantifying Disorder</h2>", unsafe_allow_html=True)
st.markdown("""
**Entropy** helps measure randomness or impurity in data.

The formula for entropy is:
""")
st.image("entropy-formula-2.jpg", width=300)
st.markdown("""
If you have two classes (Yes/No) each with a 50% chance:

$$ H(Y) = - (0.5 \cdot \log_2(0.5) + 0.5 \cdot \log_2(0.5)) = 1 $$

This means maximum uncertainty.
""", unsafe_allow_html=True)

# Gini Impurity
st.markdown("<h2>Gini Impurity: Measuring Purity</h2>", unsafe_allow_html=True)
st.markdown("""
**Gini Impurity** is another metric that measures how often a randomly chosen element would be incorrectly classified.

The formula is:
""")
st.image("gini.png", width=300)
st.markdown("""
With 50% Yes and 50% No:

$$ Gini(Y) = 1 - (0.5^2 + 0.5^2) = 0.5 $$

A lower Gini means more purity.
""", unsafe_allow_html=True)

# Construction of Decision Tree
st.markdown("<h2>How a Decision Tree is Built</h2>", unsafe_allow_html=True)
st.markdown("""
The tree grows top-down, choosing the best feature at each step based on how well it splits the data. The process ends when:
- All samples in a node are of one class.
- A stopping condition like max depth is reached.
""", unsafe_allow_html=True)

# Iris Dataset
st.markdown("<h2>Iris Dataset Example</h2>", unsafe_allow_html=True)
st.markdown("""
This tree is trained on the famous **Iris dataset**, where features like petal length help classify the flower species.
""", unsafe_allow_html=True)
st.image("dt1 (1).jpg", caption="Decision Tree for Iris Dataset", use_container_width=True)

# Training & Testing - Classification
st.markdown("<h2>Training & Testing: Classification</h2>", unsafe_allow_html=True)
st.markdown("""
- During **training**, the model learns rules from labeled data using Gini or Entropy.
- In the **testing phase**, new samples are passed through the tree to make predictions.

Example: Predict Iris species based on its features.
""", unsafe_allow_html=True)

# Training & Testing - Regression
st.markdown("<h2>Training & Testing: Regression</h2>", unsafe_allow_html=True)
st.markdown("""
- For regression, the tree splits data to reduce **Mean Squared Error (MSE)**.
- Each leaf node predicts a continuous value (e.g., house price).

Example: Predicting house prices based on area, number of rooms, etc.
""", unsafe_allow_html=True)

# Pre-Pruning
st.markdown("<h2>Controlling Overfitting: Pre-Pruning</h2>", unsafe_allow_html=True)
st.markdown("""
**Pre-pruning** stops the tree from growing too large.

Techniques:
- **Max Depth**: Limits how deep the tree can go.
- **Min Samples Split**: Minimum data points needed to split a node.
- **Min Samples Leaf**: Minimum data points required in a leaf.
- **Max Features**: Restricts number of features used per split.
""", unsafe_allow_html=True)

# Post-Pruning
st.markdown("<h2>Post-Pruning: Simplifying After Training</h2>", unsafe_allow_html=True)
st.markdown("""
**Post-pruning** trims the tree **after** full training to reduce complexity.

Methods:
- **Cost Complexity Pruning**
- **Validation Set Pruning**
""", unsafe_allow_html=True)

# Feature Selection
st.markdown("<h2>Feature Selection with Trees</h2>", unsafe_allow_html=True)
st.markdown("""
Decision Trees can rank features by how much they reduce impurity at each split.

Here's the formula used:
""")
st.image("feature.png", width=500)
st.markdown("""
The higher the score, the more important the feature.
""", unsafe_allow_html=True)

# Implementation Link
st.markdown("<h2>Try It Yourself</h2>", unsafe_allow_html=True)
st.markdown(
    "<a href='https://colab.research.google.com/drive/1SqZ5I5h7ivS6SJDwlOZQ-V4IAOg90RE7?usp=sharing' target='_blank' style='font-size: 16px; color: #add8e6;'>Open Jupyter Notebook</a>", 
    unsafe_allow_html=True
)