Spaces:

Sathwikchowdary
/

Machine_Learning_Algorithims

Sleeping

App Files Files Community

Machine_Learning_Algorithims / pages /2Decision-Tree.py

Sathwikchowdary

Update pages/2Decision-Tree.py

80ce31a verified 3 months ago

raw

history blame

4.82 kB

	import streamlit as st

	# Set page configuration
	st.set_page_config(page_title="Decision Tree Theory", layout="wide")

	# Updated CSS styling
	st.markdown("""
	<style>
	.stApp {
	background-color: #f2f6fa;
	}
	h1, h2, h3 {
	color: #1a237e;
	}
	.custom-font, p, li {
	font-family: 'Arial', sans-serif;
	font-size: 18px;
	color: #212121;
	line-height: 1.6;
	}
	</style>
	""", unsafe_allow_html=True)

	# Title
	st.markdown("<h1>Decision Tree</h1>", unsafe_allow_html=True)

	# Introduction
	st.markdown("""
	A Decision Tree is a supervised learning method used for both classification and regression. It models decisions in a tree structure, where:
	- The Root Node represents the full dataset.
	- Internal Nodes evaluate features to split the data.
	- Leaf Nodes give the output label or value.

	It's like asking a series of "yes or no" questions to reach a final decision.
	""", unsafe_allow_html=True)

	# Entropy
	st.markdown("<h2>Entropy: Quantifying Disorder</h2>", unsafe_allow_html=True)
	st.markdown("""
	Entropy helps measure randomness or impurity in data.

	The formula for entropy is:
	""")
	st.image("entropy-formula-2.jpg", width=300)
	st.markdown("""
	If you have two classes (Yes/No) each with a 50% chance:

	$$ H(Y) = - (0.5 \cdot \log_2(0.5) + 0.5 \cdot \log_2(0.5)) = 1 $$

	This means maximum uncertainty.
	""", unsafe_allow_html=True)

	# Gini Impurity
	st.markdown("<h2>Gini Impurity: Measuring Purity</h2>", unsafe_allow_html=True)
	st.markdown("""
	Gini Impurity is another metric that measures how often a randomly chosen element would be incorrectly classified.

	The formula is:
	""")
	st.image("gini.png", width=300)
	st.markdown("""
	With 50% Yes and 50% No:

	$$ Gini(Y) = 1 - (0.5^2 + 0.5^2) = 0.5 $$

	A lower Gini means more purity.
	""", unsafe_allow_html=True)

	# Construction of Decision Tree
	st.markdown("<h2>How a Decision Tree is Built</h2>", unsafe_allow_html=True)
	st.markdown("""
	The tree grows top-down, choosing the best feature at each step based on how well it splits the data. The process ends when:
	- All samples in a node are of one class.
	- A stopping condition like max depth is reached.
	""", unsafe_allow_html=True)

	# Iris Dataset
	st.markdown("<h2>Iris Dataset Example</h2>", unsafe_allow_html=True)
	st.markdown("""
	This tree is trained on the famous Iris dataset, where features like petal length help classify the flower species.
	""", unsafe_allow_html=True)
	st.image("dt1 (1).jpg", caption="Decision Tree for Iris Dataset", use_container_width=True)

	# Training & Testing - Classification
	st.markdown("<h2>Training & Testing: Classification</h2>", unsafe_allow_html=True)
	st.markdown("""
	- During training, the model learns rules from labeled data using Gini or Entropy.
	- In the testing phase, new samples are passed through the tree to make predictions.

	Example: Predict Iris species based on its features.
	""", unsafe_allow_html=True)

	# Training & Testing - Regression
	st.markdown("<h2>Training & Testing: Regression</h2>", unsafe_allow_html=True)
	st.markdown("""
	- For regression, the tree splits data to reduce Mean Squared Error (MSE).
	- Each leaf node predicts a continuous value (e.g., house price).

	Example: Predicting house prices based on area, number of rooms, etc.
	""", unsafe_allow_html=True)

	# Pre-Pruning
	st.markdown("<h2>Controlling Overfitting: Pre-Pruning</h2>", unsafe_allow_html=True)
	st.markdown("""
	Pre-pruning stops the tree from growing too large.

	Techniques:
	- Max Depth: Limits how deep the tree can go.
	- Min Samples Split: Minimum data points needed to split a node.
	- Min Samples Leaf: Minimum data points required in a leaf.
	- Max Features: Restricts number of features used per split.
	""", unsafe_allow_html=True)

	# Post-Pruning
	st.markdown("<h2>Post-Pruning: Simplifying After Training</h2>", unsafe_allow_html=True)
	st.markdown("""
	Post-pruning trims the tree after full training to reduce complexity.

	Methods:
	- Cost Complexity Pruning
	- Validation Set Pruning
	""", unsafe_allow_html=True)

	# Feature Selection
	st.markdown("<h2>Feature Selection with Trees</h2>", unsafe_allow_html=True)
	st.markdown("""
	Decision Trees can rank features by how much they reduce impurity at each split.

	Here's the formula used:
	""")
	st.image("feature.png", width=500)
	st.markdown("""
	The higher the score, the more important the feature.
	""", unsafe_allow_html=True)

	# Implementation Link
	st.markdown("<h2>Try It Yourself</h2>", unsafe_allow_html=True)
	st.markdown(
	"<a href='https://colab.research.google.com/drive/1SqZ5I5h7ivS6SJDwlOZQ-V4IAOg90RE7?usp=sharing' target='_blank' style='font-size: 16px; color: #1a237e;'>Open Jupyter Notebook</a>",
	unsafe_allow_html=True
	)