Machine_Learning_Algorithims / pages /3Ensemble_Techniques.py
Sathwikchowdary's picture
Update pages/3Ensemble_Techniques.py
ad9bdc8 verified
raw
history blame
6.41 kB
import streamlit as st
# Page configuration
st.set_page_config(page_title="Ensemble Techniques", page_icon="🤖", layout="wide")
# Custom styling
st.markdown("""
<style>
.stApp {
background-color: #f2f6fa;
}
h1, h2, h3 {
color: #1a237e;
}
.custom-font, p, li {
font-family: 'Arial', sans-serif;
font-size: 18px;
color: #212121;
line-height: 1.6;
}
</style>
""", unsafe_allow_html=True)
# Title
st.markdown("<h1>Ensemble Learning Techniques</h1>", unsafe_allow_html=True)
# Introduction
st.markdown("""
Ensemble learning is a strategy in machine learning where **multiple models**—called base models—are combined to produce a more accurate and robust **ensemble model**. The core idea is that a group of diverse models often performs better than any individual model alone.
""", unsafe_allow_html=True)
st.markdown("**Assumption:** The base models should be **diverse**. If they are too similar, the overall ensemble may lose its advantage and yield poor results.")
# Types of Ensemble
st.markdown("<h2>Types of Ensemble Techniques</h2>", unsafe_allow_html=True)
st.write("Ensemble techniques vary based on how base models are built and how their outputs are combined.")
st.image("diff_ensemble_tecniques.png", width=900)
# Voting Ensemble
st.markdown("<h2>1. Voting Ensemble</h2>", unsafe_allow_html=True)
st.write("Voting is a straightforward ensemble approach suitable for both classification and regression. It aggregates the predictions from multiple models to make the final prediction.")
st.write("**Types:**")
st.write("- **Hard Voting**: Final output is the most frequent class label among base models.")
st.write("- **Soft Voting**: Uses the average of class probabilities to decide the output.")
st.markdown("**Steps for Classification:**")
st.markdown("""
1. Select different base models.
2. Train each on the same dataset.
3. Gather predictions.
4. Use hard or soft voting to finalize.
""")
st.image("voting.jpg", width=900)
st.markdown("**Steps for Regression:**")
st.markdown("""
1. Train various regression models.
2. Get predictions from all models.
3. Calculate the average or median of predictions.
""")
st.markdown("**Important Parameters:**")
st.markdown("- `voting`: Choose between 'hard' or 'soft' voting\n- `weights`: Assign relative importance to models")
# Voting implementation link
st.markdown("<h2>Voting Implementation Example</h2>", unsafe_allow_html=True)
st.markdown(
"<a href='https://colab.research.google.com/drive/1LPZR9RnvEXP8mzOLOBfSVVyHHZ7GFns4?usp=sharing' target='_blank' style='font-size: 16px; color: #1a237e;'>Open Jupyter Notebook</a>",
unsafe_allow_html=True
)
# Bagging
st.markdown("<h2>2. Bagging (Bootstrap Aggregating)</h2>", unsafe_allow_html=True)
st.write("Bagging boosts model performance by training the same algorithm on different random subsets (with replacement) of the dataset.")
st.write("Unlike voting, bagging keeps the algorithm fixed and varies the training data to create diverse models.")
st.write("**Variants:**")
st.write("- **Bagging**: General form, any model can be used.")
st.write("- **Random Forest**: Special form using decision trees with added randomness.")
st.image("bagging.jpg", width=900)
st.markdown("**Steps for Classification:**")
st.markdown("""
1. Generate bootstrapped samples.
2. Train models on each sample.
3. Aggregate outputs using majority vote.
""")
st.markdown("**Steps for Regression:**")
st.markdown("""
1. Create random samples from the dataset.
2. Train models on each.
3. Average the predictions.
""")
st.markdown("<h2>How to Create Bootstrapped Samples</h2>", unsafe_allow_html=True)
st.write("**Row and Column Sampling** help increase model diversity in bagging.")
st.write("**Row Sampling:**")
st.write("- With Replacement: Duplicates allowed (classic bootstrapping)")
st.write("- Without Replacement: Unique rows only (pasting)")
st.write("**Column Sampling:**")
st.write("- With Replacement: Some features may repeat.")
st.write("- Without Replacement: Each feature is used only once per model.")
st.markdown("**Important Parameters:**")
st.markdown("- `n_estimators`: Number of models to train\n- `max_samples`: % of data per model\n- `bootstrap`: Whether sampling is with replacement")
# Bagging implementation link
st.markdown("<h2>Bagging Implementation Example</h2>", unsafe_allow_html=True)
st.markdown(
"<a href='https://colab.research.google.com/drive/1cumZl7H9fqyORfaw236WWxQViJxvSKHV?usp=sharing' target='_blank' style='font-size: 16px; color: #1a237e;'>Open Jupyter Notebook</a>",
unsafe_allow_html=True
)
# Random Forest
st.markdown("<h2>3. Random Forest</h2>", unsafe_allow_html=True)
st.write("Random Forest is a popular ensemble method that builds multiple decision trees using bootstrapped samples. It adds another layer of randomness by selecting a subset of features at each split.")
st.image("randomforest.jpg", width=900)
st.markdown("**Steps for Classification:**")
st.markdown("""
1. Create bootstrapped samples.
2. Train decision trees using random feature selection at each split.
3. Combine predictions using majority vote.
""")
st.markdown("**Steps for Regression:**")
st.markdown("""
1. Prepare bootstrapped training sets.
2. Train decision tree regressors with random feature splits.
3. Predict by averaging model outputs.
""")
st.markdown("**Bagging vs Random Forest:**")
st.markdown("""
- **Bagging:** Any algorithm, row/column sampling optional
- **Random Forest:** Uses decision trees only, always samples rows & features
- **Bagging:** No internal randomness
- **Random Forest:** Adds randomness via feature selection
""")
# Random Forest implementation link
st.markdown("<h2>Random Forest Implementation Example</h2>", unsafe_allow_html=True)
st.markdown(
"<a href='https://colab.research.google.com/drive/1S6YyfTx9N35E5fpPF0z6ZDm85BSp1deT?usp=sharing' target='_blank' style='font-size: 16px; color: #1a237e;'>Open Jupyter Notebook</a>",
unsafe_allow_html=True
)
# Conclusion
st.markdown("""
Ensemble learning is a powerful approach that enhances model accuracy, reduces overfitting, and improves robustness. Choosing between techniques like **Voting**, **Bagging**, and **Random Forest** depends on your use case and the nature of the data.
""", unsafe_allow_html=True)