|
import streamlit as st |
|
|
|
|
|
st.set_page_config(page_title="Ensemble Techniques", page_icon="🤖", layout="wide") |
|
|
|
|
|
st.markdown(""" |
|
<style> |
|
.stApp { |
|
background-color: #f2f6fa; |
|
} |
|
h1, h2, h3 { |
|
color: #1a237e; |
|
} |
|
.custom-font, p, li { |
|
font-family: 'Arial', sans-serif; |
|
font-size: 18px; |
|
color: #212121; |
|
line-height: 1.6; |
|
} |
|
</style> |
|
""", unsafe_allow_html=True) |
|
|
|
|
|
st.markdown("<h1>Ensemble Learning Techniques</h1>", unsafe_allow_html=True) |
|
|
|
|
|
st.markdown(""" |
|
Ensemble learning is a strategy in machine learning where **multiple models**—called base models—are combined to produce a more accurate and robust **ensemble model**. The core idea is that a group of diverse models often performs better than any individual model alone. |
|
""", unsafe_allow_html=True) |
|
|
|
st.markdown("**Assumption:** The base models should be **diverse**. If they are too similar, the overall ensemble may lose its advantage and yield poor results.") |
|
|
|
|
|
st.markdown("<h2>Types of Ensemble Techniques</h2>", unsafe_allow_html=True) |
|
st.write("Ensemble techniques vary based on how base models are built and how their outputs are combined.") |
|
st.image("diff_ensemble_tecniques.png", width=900) |
|
|
|
|
|
st.markdown("<h2>1. Voting Ensemble</h2>", unsafe_allow_html=True) |
|
st.write("Voting is a straightforward ensemble approach suitable for both classification and regression. It aggregates the predictions from multiple models to make the final prediction.") |
|
|
|
st.write("**Types:**") |
|
st.write("- **Hard Voting**: Final output is the most frequent class label among base models.") |
|
st.write("- **Soft Voting**: Uses the average of class probabilities to decide the output.") |
|
|
|
st.markdown("**Steps for Classification:**") |
|
st.markdown(""" |
|
1. Select different base models. |
|
2. Train each on the same dataset. |
|
3. Gather predictions. |
|
4. Use hard or soft voting to finalize. |
|
""") |
|
|
|
st.image("voting.jpg", width=900) |
|
|
|
st.markdown("**Steps for Regression:**") |
|
st.markdown(""" |
|
1. Train various regression models. |
|
2. Get predictions from all models. |
|
3. Calculate the average or median of predictions. |
|
""") |
|
|
|
st.markdown("**Important Parameters:**") |
|
st.markdown("- `voting`: Choose between 'hard' or 'soft' voting\n- `weights`: Assign relative importance to models") |
|
|
|
|
|
st.markdown("<h2>Voting Implementation Example</h2>", unsafe_allow_html=True) |
|
st.markdown( |
|
"<a href='https://colab.research.google.com/drive/1LPZR9RnvEXP8mzOLOBfSVVyHHZ7GFns4?usp=sharing' target='_blank' style='font-size: 16px; color: #1a237e;'>Open Jupyter Notebook</a>", |
|
unsafe_allow_html=True |
|
) |
|
|
|
|
|
st.markdown("<h2>2. Bagging (Bootstrap Aggregating)</h2>", unsafe_allow_html=True) |
|
st.write("Bagging boosts model performance by training the same algorithm on different random subsets (with replacement) of the dataset.") |
|
|
|
st.write("Unlike voting, bagging keeps the algorithm fixed and varies the training data to create diverse models.") |
|
|
|
st.write("**Variants:**") |
|
st.write("- **Bagging**: General form, any model can be used.") |
|
st.write("- **Random Forest**: Special form using decision trees with added randomness.") |
|
|
|
st.image("bagging.jpg", width=900) |
|
|
|
st.markdown("**Steps for Classification:**") |
|
st.markdown(""" |
|
1. Generate bootstrapped samples. |
|
2. Train models on each sample. |
|
3. Aggregate outputs using majority vote. |
|
""") |
|
|
|
st.markdown("**Steps for Regression:**") |
|
st.markdown(""" |
|
1. Create random samples from the dataset. |
|
2. Train models on each. |
|
3. Average the predictions. |
|
""") |
|
|
|
st.markdown("<h2>How to Create Bootstrapped Samples</h2>", unsafe_allow_html=True) |
|
st.write("**Row and Column Sampling** help increase model diversity in bagging.") |
|
|
|
st.write("**Row Sampling:**") |
|
st.write("- With Replacement: Duplicates allowed (classic bootstrapping)") |
|
st.write("- Without Replacement: Unique rows only (pasting)") |
|
|
|
st.write("**Column Sampling:**") |
|
st.write("- With Replacement: Some features may repeat.") |
|
st.write("- Without Replacement: Each feature is used only once per model.") |
|
|
|
st.markdown("**Important Parameters:**") |
|
st.markdown("- `n_estimators`: Number of models to train\n- `max_samples`: % of data per model\n- `bootstrap`: Whether sampling is with replacement") |
|
|
|
|
|
st.markdown("<h2>Bagging Implementation Example</h2>", unsafe_allow_html=True) |
|
st.markdown( |
|
"<a href='https://colab.research.google.com/drive/1cumZl7H9fqyORfaw236WWxQViJxvSKHV?usp=sharing' target='_blank' style='font-size: 16px; color: #1a237e;'>Open Jupyter Notebook</a>", |
|
unsafe_allow_html=True |
|
) |
|
|
|
|
|
st.markdown("<h2>3. Random Forest</h2>", unsafe_allow_html=True) |
|
st.write("Random Forest is a popular ensemble method that builds multiple decision trees using bootstrapped samples. It adds another layer of randomness by selecting a subset of features at each split.") |
|
|
|
st.image("randomforest.jpg", width=900) |
|
|
|
st.markdown("**Steps for Classification:**") |
|
st.markdown(""" |
|
1. Create bootstrapped samples. |
|
2. Train decision trees using random feature selection at each split. |
|
3. Combine predictions using majority vote. |
|
""") |
|
|
|
st.markdown("**Steps for Regression:**") |
|
st.markdown(""" |
|
1. Prepare bootstrapped training sets. |
|
2. Train decision tree regressors with random feature splits. |
|
3. Predict by averaging model outputs. |
|
""") |
|
|
|
st.markdown("**Bagging vs Random Forest:**") |
|
st.markdown(""" |
|
- **Bagging:** Any algorithm, row/column sampling optional |
|
- **Random Forest:** Uses decision trees only, always samples rows & features |
|
- **Bagging:** No internal randomness |
|
- **Random Forest:** Adds randomness via feature selection |
|
""") |
|
|
|
|
|
st.markdown("<h2>Random Forest Implementation Example</h2>", unsafe_allow_html=True) |
|
st.markdown( |
|
"<a href='https://colab.research.google.com/drive/1S6YyfTx9N35E5fpPF0z6ZDm85BSp1deT?usp=sharing' target='_blank' style='font-size: 16px; color: #1a237e;'>Open Jupyter Notebook</a>", |
|
unsafe_allow_html=True |
|
) |
|
|
|
|
|
st.markdown(""" |
|
Ensemble learning is a powerful approach that enhances model accuracy, reduces overfitting, and improves robustness. Choosing between techniques like **Voting**, **Bagging**, and **Random Forest** depends on your use case and the nature of the data. |
|
""", unsafe_allow_html=True) |
|
|