import streamlit as st
# Set page configuration
st.set_page_config(page_title="Decision Tree Theory", layout="wide")
# Custom CSS for styling
st.markdown("""
""", unsafe_allow_html=True)
# Title
st.markdown("
Decision Tree
", unsafe_allow_html=True)
# Introduction
st.markdown("""
A **Decision Tree** is a supervised learning method used for both classification and regression. It models decisions in a tree structure, where:
- The **Root Node** represents the full dataset.
- **Internal Nodes** evaluate features to split the data.
- **Leaf Nodes** give the output label or value.
It's like asking a series of "yes or no" questions to reach a final decision.
""", unsafe_allow_html=True)
# Entropy
st.markdown("Entropy: Quantifying Disorder
", unsafe_allow_html=True)
st.markdown("""
**Entropy** helps measure randomness or impurity in data.
The formula for entropy is:
""")
st.image("entropy-formula-2.jpg", width=300)
st.markdown("""
If you have two classes (Yes/No) each with a 50% chance:
$$ H(Y) = - (0.5 \cdot \log_2(0.5) + 0.5 \cdot \log_2(0.5)) = 1 $$
This means maximum uncertainty.
""", unsafe_allow_html=True)
# Gini Impurity
st.markdown("Gini Impurity: Measuring Purity
", unsafe_allow_html=True)
st.markdown("""
**Gini Impurity** is another metric that measures how often a randomly chosen element would be incorrectly classified.
The formula is:
""")
st.image("gini.png", width=300)
st.markdown("""
With 50% Yes and 50% No:
$$ Gini(Y) = 1 - (0.5^2 + 0.5^2) = 0.5 $$
A lower Gini means more purity.
""", unsafe_allow_html=True)
# Construction of Decision Tree
st.markdown("How a Decision Tree is Built
", unsafe_allow_html=True)
st.markdown("""
The tree grows top-down, choosing the best feature at each step based on how well it splits the data. The process ends when:
- All samples in a node are of one class.
- A stopping condition like max depth is reached.
""", unsafe_allow_html=True)
# Iris Dataset
st.markdown("Iris Dataset Example
", unsafe_allow_html=True)
st.markdown("""
This tree is trained on the famous **Iris dataset**, where features like petal length help classify the flower species.
""", unsafe_allow_html=True)
st.image("dt1 (1).jpg", caption="Decision Tree for Iris Dataset", use_container_width=True)
# Training & Testing - Classification
st.markdown("Training & Testing: Classification
", unsafe_allow_html=True)
st.markdown("""
- During **training**, the model learns rules from labeled data using Gini or Entropy.
- In the **testing phase**, new samples are passed through the tree to make predictions.
Example: Predict Iris species based on its features.
""", unsafe_allow_html=True)
# Training & Testing - Regression
st.markdown("Training & Testing: Regression
", unsafe_allow_html=True)
st.markdown("""
- For regression, the tree splits data to reduce **Mean Squared Error (MSE)**.
- Each leaf node predicts a continuous value (e.g., house price).
Example: Predicting house prices based on area, number of rooms, etc.
""", unsafe_allow_html=True)
# Pre-Pruning
st.markdown("Controlling Overfitting: Pre-Pruning
", unsafe_allow_html=True)
st.markdown("""
**Pre-pruning** stops the tree from growing too large.
Techniques:
- **Max Depth**: Limits how deep the tree can go.
- **Min Samples Split**: Minimum data points needed to split a node.
- **Min Samples Leaf**: Minimum data points required in a leaf.
- **Max Features**: Restricts number of features used per split.
""", unsafe_allow_html=True)
# Post-Pruning
st.markdown("Post-Pruning: Simplifying After Training
", unsafe_allow_html=True)
st.markdown("""
**Post-pruning** trims the tree **after** full training to reduce complexity.
Methods:
- **Cost Complexity Pruning**
- **Validation Set Pruning**
""", unsafe_allow_html=True)
# Feature Selection
st.markdown("Feature Selection with Trees
", unsafe_allow_html=True)
st.markdown("""
Decision Trees can rank features by how much they reduce impurity at each split.
Here's the formula used:
""")
st.image("feature.png", width=500)
st.markdown("""
The higher the score, the more important the feature.
""", unsafe_allow_html=True)
# Implementation Link
st.markdown("Try It Yourself
", unsafe_allow_html=True)
st.markdown(
"Open Jupyter Notebook",
unsafe_allow_html=True
)