|
import streamlit as st |
|
|
|
|
|
st.set_page_config(page_title="KNN Overview", page_icon="π", layout="wide") |
|
|
|
|
|
st.markdown(""" |
|
<style> |
|
.stApp { |
|
background-color: #f2f6fa; |
|
} |
|
h1, h2, h3 { |
|
color: #1a237e; |
|
} |
|
.custom-font, p { |
|
font-family: 'Arial', sans-serif; |
|
font-size: 18px; |
|
color: #212121; |
|
line-height: 1.6; |
|
} |
|
</style> |
|
""", unsafe_allow_html=True) |
|
|
|
|
|
st.markdown("<h1 style='color: #1a237e;'>Understanding K-Nearest Neighbors (KNN)</h1>", unsafe_allow_html=True) |
|
|
|
|
|
st.write(""" |
|
K-Nearest Neighbors (KNN) is a fundamental machine learning method suitable for both **classification** and **regression** problems. It makes predictions by analyzing the `K` closest data points in the training set. |
|
|
|
Key features: |
|
- KNN is a non-parametric model. |
|
- It memorizes training data instead of learning a model. |
|
- Distance metrics like **Euclidean** help determine similarity between data points. |
|
""") |
|
|
|
|
|
st.markdown("<h2 style='color: #1a237e;'>How KNN Functions</h2>", unsafe_allow_html=True) |
|
|
|
st.subheader("Training Phase") |
|
st.write(""" |
|
- KNN doesn't train a model in the traditional sense. |
|
- It stores the dataset and uses it during prediction. |
|
""") |
|
|
|
st.subheader("Prediction - Classification") |
|
st.write(""" |
|
1. Set the value of `k`. |
|
2. Calculate the distance between the input and each point in the training data. |
|
3. Identify the `k` nearest neighbors. |
|
4. Use majority voting to assign the class label. |
|
""") |
|
|
|
st.subheader("Prediction - Regression") |
|
st.write(""" |
|
1. Choose `k`. |
|
2. Find the distances to all training points. |
|
3. Pick the closest `k` neighbors. |
|
4. Predict using the **average** or **weighted average** of their values. |
|
""") |
|
|
|
|
|
st.subheader("Model Behavior") |
|
st.write(""" |
|
- **Overfitting**: Occurs when the model captures noise by using very low values of `k`. |
|
- **Underfitting**: Happens when the model oversimplifies, often with high `k` values. |
|
- **Optimal Fit**: Found by balancing both, often using cross-validation. |
|
""") |
|
|
|
|
|
st.subheader("Error Analysis") |
|
st.write(""" |
|
- **Training Error**: Error on the dataset used for fitting. |
|
- **Cross-Validation Error**: Error on separate validation data. |
|
- Ideal models show low error in both. |
|
""") |
|
|
|
|
|
st.subheader("Hyperparameter Choices") |
|
st.write(""" |
|
Important tuning options for KNN include: |
|
- `k`: Number of neighbors |
|
- `weights`: `uniform` or `distance` |
|
- `metric`: Distance formula like Euclidean or Manhattan |
|
- `n_jobs`: Parallel processing support |
|
""") |
|
|
|
|
|
st.subheader("Why Scaling is Crucial") |
|
st.write(""" |
|
KNN relies heavily on distances, so it's essential to scale features. Use: |
|
- **Min-Max Normalization** to compress values between 0 and 1. |
|
- **Z-score Standardization** to center data. |
|
|
|
Always scale training and testing data separately. |
|
""") |
|
|
|
|
|
st.subheader("Weighted KNN") |
|
st.write(""" |
|
In Weighted KNN, closer neighbors have more influence on the result. It improves accuracy, especially in noisy or uneven data. |
|
""") |
|
|
|
|
|
st.subheader("Decision Boundaries") |
|
st.write(""" |
|
KNN creates boundaries based on training data: |
|
- Small `k` = complex, sensitive regions (risk of overfitting). |
|
- Large `k` = smoother regions (risk of underfitting). |
|
""") |
|
|
|
|
|
st.subheader("Cross-Validation") |
|
st.write(""" |
|
Cross-validation helps evaluate models effectively. For example: |
|
- **K-Fold CV** divides data into parts and tests each part. |
|
- Ensures model generalization. |
|
""") |
|
|
|
|
|
st.subheader("Tuning Methods") |
|
st.write(""" |
|
- **Grid Search**: Tests all combinations of parameters. |
|
- **Random Search**: Picks random combinations for faster tuning. |
|
- **Bayesian Search**: Uses previous results to make better guesses on parameter selection. |
|
""") |
|
|
|
|
|
st.markdown("<h2 style='color: #1a237e;'>KNN Implementation Notebook</h2>", unsafe_allow_html=True) |
|
st.markdown( |
|
"<a href='https://colab.research.google.com/drive/11wk6wt7sZImXhTqzYrre3ic4oj3KFC4M?usp=sharing' target='_blank' style='font-size: 16px; color: #1a237e;'>Click here to open the Colab notebook</a>", |
|
unsafe_allow_html=True |
|
) |
|
|
|
st.write(""" |
|
KNN is intuitive and effective when combined with proper preprocessing and hyperparameter tuning. Use cross-validation to find the sweet spot and avoid overfitting or underfitting. |
|
""") |
|
|