Spaces:

Sathwikchowdary
/

Machine_Learning_Algorithims

Sleeping

App Files Files Community

Sathwikchowdary commited on Apr 8

Commit

578a444

verified ·

1 Parent(s): 0683b2a

Create 2Decision-Tree.py

Browse files

Files changed (1) hide show

pages/2Decision-Tree.py +141 -0

pages/2Decision-Tree.py ADDED Viewed

	@@ -0,0 +1,141 @@

+import streamlit as st
+# Set page configuration
+st.set_page_config(page_title="Decision Tree Theory", layout="wide")
+# Custom CSS for styling
+st.markdown("""
+    <style>
+        .stApp {
+            background: linear-gradient(135deg, #1e3c72, #2a5298);
+        }
+        h1, h2 {
+            color: #fdfdfd;
+        }
+        p, li {
+            font-family: 'Arial', sans-serif;
+            font-size: 18px;
+            color: #f0f0f0;
+            line-height: 1.6;
+        }
+    </style>
+""", unsafe_allow_html=True)
+# Title
+st.markdown("<h1>Decision Tree</h1>", unsafe_allow_html=True)
+# Introduction
+st.markdown("""
+A **Decision Tree** is a supervised learning method used for both classification and regression. It models decisions in a tree structure, where:
+- The **Root Node** represents the full dataset.
+- **Internal Nodes** evaluate features to split the data.
+- **Leaf Nodes** give the output label or value.
+It's like asking a series of "yes or no" questions to reach a final decision.
+""", unsafe_allow_html=True)
+# Entropy
+st.markdown("<h2>Entropy: Quantifying Disorder</h2>", unsafe_allow_html=True)
+st.markdown("""
+**Entropy** helps measure randomness or impurity in data.
+The formula for entropy is:
+""")
+st.image("entropy-formula-2.jpg", width=300)
+st.markdown("""
+If you have two classes (Yes/No) each with a 50% chance:
+$$ H(Y) = - (0.5 \cdot \log_2(0.5) + 0.5 \cdot \log_2(0.5)) = 1 $$
+This means maximum uncertainty.
+""", unsafe_allow_html=True)
+# Gini Impurity
+st.markdown("<h2>Gini Impurity: Measuring Purity</h2>", unsafe_allow_html=True)
+st.markdown("""
+**Gini Impurity** is another metric that measures how often a randomly chosen element would be incorrectly classified.
+The formula is:
+""")
+st.image("gini.png", width=300)
+st.markdown("""
+With 50% Yes and 50% No:
+$$ Gini(Y) = 1 - (0.5^2 + 0.5^2) = 0.5 $$
+A lower Gini means more purity.
+""", unsafe_allow_html=True)
+# Construction of Decision Tree
+st.markdown("<h2>How a Decision Tree is Built</h2>", unsafe_allow_html=True)
+st.markdown("""
+The tree grows top-down, choosing the best feature at each step based on how well it splits the data. The process ends when:
+- All samples in a node are of one class.
+- A stopping condition like max depth is reached.
+""", unsafe_allow_html=True)
+# Iris Dataset
+st.markdown("<h2>Iris Dataset Example</h2>", unsafe_allow_html=True)
+st.markdown("""
+This tree is trained on the famous **Iris dataset**, where features like petal length help classify the flower species.
+""", unsafe_allow_html=True)
+st.image("dt1 (1).jpg", caption="Decision Tree for Iris Dataset", use_container_width=True)
+# Training & Testing - Classification
+st.markdown("<h2>Training & Testing: Classification</h2>", unsafe_allow_html=True)
+st.markdown("""
+- During **training**, the model learns rules from labeled data using Gini or Entropy.
+- In the **testing phase**, new samples are passed through the tree to make predictions.
+Example: Predict Iris species based on its features.
+""", unsafe_allow_html=True)
+# Training & Testing - Regression
+st.markdown("<h2>Training & Testing: Regression</h2>", unsafe_allow_html=True)
+st.markdown("""
+- For regression, the tree splits data to reduce **Mean Squared Error (MSE)**.
+- Each leaf node predicts a continuous value (e.g., house price).
+Example: Predicting house prices based on area, number of rooms, etc.
+""", unsafe_allow_html=True)
+# Pre-Pruning
+st.markdown("<h2>Controlling Overfitting: Pre-Pruning</h2>", unsafe_allow_html=True)
+st.markdown("""
+**Pre-pruning** stops the tree from growing too large.
+Techniques:
+- **Max Depth**: Limits how deep the tree can go.
+- **Min Samples Split**: Minimum data points needed to split a node.
+- **Min Samples Leaf**: Minimum data points required in a leaf.
+- **Max Features**: Restricts number of features used per split.
+""", unsafe_allow_html=True)
+# Post-Pruning
+st.markdown("<h2>Post-Pruning: Simplifying After Training</h2>", unsafe_allow_html=True)
+st.markdown("""
+**Post-pruning** trims the tree **after** full training to reduce complexity.
+Methods:
+- **Cost Complexity Pruning**
+- **Validation Set Pruning**
+""", unsafe_allow_html=True)
+# Feature Selection
+st.markdown("<h2>Feature Selection with Trees</h2>", unsafe_allow_html=True)
+st.markdown("""
+Decision Trees can rank features by how much they reduce impurity at each split.
+Here's the formula used:
+""")
+st.image("feature.png", width=500)
+st.markdown("""
+The higher the score, the more important the feature.
+""", unsafe_allow_html=True)
+# Implementation Link
+st.markdown("<h2>Try It Yourself</h2>", unsafe_allow_html=True)
+st.markdown(
+    "<a href='https://colab.research.google.com/drive/1SqZ5I5h7ivS6SJDwlOZQ-V4IAOg90RE7?usp=sharing' target='_blank' style='font-size: 16px; color: #add8e6;'>Open Jupyter Notebook</a>",
+    unsafe_allow_html=True
+)