import streamlit as st # Set page configuration st.set_page_config(page_title="Decision Tree Theory", layout="wide") # Updated CSS styling st.markdown(""" """, unsafe_allow_html=True) # Title st.markdown("

Decision Tree

", unsafe_allow_html=True) # Introduction st.markdown(""" A **Decision Tree** is a supervised learning method used for both classification and regression. It models decisions in a tree structure, where: - The **Root Node** represents the full dataset. - **Internal Nodes** evaluate features to split the data. - **Leaf Nodes** give the output label or value. It's like asking a series of "yes or no" questions to reach a final decision. """, unsafe_allow_html=True) # Entropy st.markdown("

Entropy: Quantifying Disorder

", unsafe_allow_html=True) st.markdown(""" **Entropy** helps measure randomness or impurity in data. The formula for entropy is: """) st.image("entropy-formula-2.jpg", width=300) st.markdown(""" If you have two classes (Yes/No) each with a 50% chance: $$ H(Y) = - (0.5 \cdot \log_2(0.5) + 0.5 \cdot \log_2(0.5)) = 1 $$ This means maximum uncertainty. """, unsafe_allow_html=True) # Gini Impurity st.markdown("

Gini Impurity: Measuring Purity

", unsafe_allow_html=True) st.markdown(""" **Gini Impurity** is another metric that measures how often a randomly chosen element would be incorrectly classified. The formula is: """) st.image("gini.png", width=300) st.markdown(""" With 50% Yes and 50% No: $$ Gini(Y) = 1 - (0.5^2 + 0.5^2) = 0.5 $$ A lower Gini means more purity. """, unsafe_allow_html=True) # Construction of Decision Tree st.markdown("

How a Decision Tree is Built

", unsafe_allow_html=True) st.markdown(""" The tree grows top-down, choosing the best feature at each step based on how well it splits the data. The process ends when: - All samples in a node are of one class. - A stopping condition like max depth is reached. """, unsafe_allow_html=True) # Iris Dataset st.markdown("

Iris Dataset Example

", unsafe_allow_html=True) st.markdown(""" This tree is trained on the famous **Iris dataset**, where features like petal length help classify the flower species. """, unsafe_allow_html=True) st.image("dt1 (1).jpg", caption="Decision Tree for Iris Dataset", use_container_width=True) # Training & Testing - Classification st.markdown("

Training & Testing: Classification

", unsafe_allow_html=True) st.markdown(""" - During **training**, the model learns rules from labeled data using Gini or Entropy. - In the **testing phase**, new samples are passed through the tree to make predictions. Example: Predict Iris species based on its features. """, unsafe_allow_html=True) # Training & Testing - Regression st.markdown("

Training & Testing: Regression

", unsafe_allow_html=True) st.markdown(""" - For regression, the tree splits data to reduce **Mean Squared Error (MSE)**. - Each leaf node predicts a continuous value (e.g., house price). Example: Predicting house prices based on area, number of rooms, etc. """, unsafe_allow_html=True) # Pre-Pruning st.markdown("

Controlling Overfitting: Pre-Pruning

", unsafe_allow_html=True) st.markdown(""" **Pre-pruning** stops the tree from growing too large. Techniques: - **Max Depth**: Limits how deep the tree can go. - **Min Samples Split**: Minimum data points needed to split a node. - **Min Samples Leaf**: Minimum data points required in a leaf. - **Max Features**: Restricts number of features used per split. """, unsafe_allow_html=True) # Post-Pruning st.markdown("

Post-Pruning: Simplifying After Training

", unsafe_allow_html=True) st.markdown(""" **Post-pruning** trims the tree **after** full training to reduce complexity. Methods: - **Cost Complexity Pruning** - **Validation Set Pruning** """, unsafe_allow_html=True) # Feature Selection st.markdown("

Feature Selection with Trees

", unsafe_allow_html=True) st.markdown(""" Decision Trees can rank features by how much they reduce impurity at each split. Here's the formula used: """) st.image("feature.png", width=500) st.markdown(""" The higher the score, the more important the feature. """, unsafe_allow_html=True) # Implementation Link st.markdown("

Try It Yourself

", unsafe_allow_html=True) st.markdown( "Open Jupyter Notebook", unsafe_allow_html=True )