Spaces:

Sathwikchowdary
/

Machine_Learning_Algorithims

Sleeping

File size: 4,818 Bytes

import streamlit as st

# Set page configuration
st.set_page_config(page_title="Decision Tree Theory", layout="wide")

# Updated CSS styling
st.markdown("""
    <style>
        .stApp {
            background-color: #f2f6fa;
        }
        h1, h2, h3 {
            color: #1a237e;
        }
        .custom-font, p, li {
            font-family: 'Arial', sans-serif;
            font-size: 18px;
            color: #212121;
            line-height: 1.6;
        }
    </style>
""", unsafe_allow_html=True)

# Title
st.markdown("<h1>Decision Tree</h1>", unsafe_allow_html=True)

# Introduction
st.markdown("""
A **Decision Tree** is a supervised learning method used for both classification and regression. It models decisions in a tree structure, where:
- The **Root Node** represents the full dataset.
- **Internal Nodes** evaluate features to split the data.
- **Leaf Nodes** give the output label or value.

It's like asking a series of "yes or no" questions to reach a final decision.
""", unsafe_allow_html=True)

# Entropy
st.markdown("<h2>Entropy: Quantifying Disorder</h2>", unsafe_allow_html=True)
st.markdown("""
**Entropy** helps measure randomness or impurity in data.

The formula for entropy is:
""")
st.image("entropy-formula-2.jpg", width=300)
st.markdown("""
If you have two classes (Yes/No) each with a 50% chance:

$$ H(Y) = - (0.5 \cdot \log_2(0.5) + 0.5 \cdot \log_2(0.5)) = 1 $$

This means maximum uncertainty.
""", unsafe_allow_html=True)

# Gini Impurity
st.markdown("<h2>Gini Impurity: Measuring Purity</h2>", unsafe_allow_html=True)
st.markdown("""
**Gini Impurity** is another metric that measures how often a randomly chosen element would be incorrectly classified.

The formula is:
""")
st.image("gini.png", width=300)
st.markdown("""
With 50% Yes and 50% No:

$$ Gini(Y) = 1 - (0.5^2 + 0.5^2) = 0.5 $$

A lower Gini means more purity.
""", unsafe_allow_html=True)

# Construction of Decision Tree
st.markdown("<h2>How a Decision Tree is Built</h2>", unsafe_allow_html=True)
st.markdown("""
The tree grows top-down, choosing the best feature at each step based on how well it splits the data. The process ends when:
- All samples in a node are of one class.
- A stopping condition like max depth is reached.
""", unsafe_allow_html=True)

# Iris Dataset
st.markdown("<h2>Iris Dataset Example</h2>", unsafe_allow_html=True)
st.markdown("""
This tree is trained on the famous **Iris dataset**, where features like petal length help classify the flower species.
""", unsafe_allow_html=True)
st.image("dt1 (1).jpg", caption="Decision Tree for Iris Dataset", use_container_width=True)

# Training & Testing - Classification
st.markdown("<h2>Training & Testing: Classification</h2>", unsafe_allow_html=True)
st.markdown("""
- During **training**, the model learns rules from labeled data using Gini or Entropy.
- In the **testing phase**, new samples are passed through the tree to make predictions.

Example: Predict Iris species based on its features.
""", unsafe_allow_html=True)

# Training & Testing - Regression
st.markdown("<h2>Training & Testing: Regression</h2>", unsafe_allow_html=True)
st.markdown("""
- For regression, the tree splits data to reduce **Mean Squared Error (MSE)**.
- Each leaf node predicts a continuous value (e.g., house price).

Example: Predicting house prices based on area, number of rooms, etc.
""", unsafe_allow_html=True)

# Pre-Pruning
st.markdown("<h2>Controlling Overfitting: Pre-Pruning</h2>", unsafe_allow_html=True)
st.markdown("""
**Pre-pruning** stops the tree from growing too large.

Techniques:
- **Max Depth**: Limits how deep the tree can go.
- **Min Samples Split**: Minimum data points needed to split a node.
- **Min Samples Leaf**: Minimum data points required in a leaf.
- **Max Features**: Restricts number of features used per split.
""", unsafe_allow_html=True)

# Post-Pruning
st.markdown("<h2>Post-Pruning: Simplifying After Training</h2>", unsafe_allow_html=True)
st.markdown("""
**Post-pruning** trims the tree **after** full training to reduce complexity.

Methods:
- **Cost Complexity Pruning**
- **Validation Set Pruning**
""", unsafe_allow_html=True)

# Feature Selection
st.markdown("<h2>Feature Selection with Trees</h2>", unsafe_allow_html=True)
st.markdown("""
Decision Trees can rank features by how much they reduce impurity at each split.

Here's the formula used:
""")
st.image("feature.png", width=500)
st.markdown("""
The higher the score, the more important the feature.
""", unsafe_allow_html=True)

# Implementation Link
st.markdown("<h2>Try It Yourself</h2>", unsafe_allow_html=True)
st.markdown(
    "<a href='https://colab.research.google.com/drive/1SqZ5I5h7ivS6SJDwlOZQ-V4IAOg90RE7?usp=sharing' target='_blank' style='font-size: 16px; color: #1a237e;'>Open Jupyter Notebook</a>", 
    unsafe_allow_html=True
)