Spaces:
Sleeping
Sleeping
import streamlit as st | |
# Page configuration | |
st.set_page_config(page_title="KNN Overview", page_icon="π", layout="wide") | |
# Custom CSS styling for a cleaner, light-colored interface | |
st.markdown(""" | |
<style> | |
.stApp { | |
background-color: #f2f6fa; | |
} | |
h1, h2, h3 { | |
color: #1a237e; | |
} | |
.custom-font, p { | |
font-family: 'Arial', sans-serif; | |
font-size: 18px; | |
color: #212121; | |
line-height: 1.6; | |
} | |
</style> | |
""", unsafe_allow_html=True) | |
# Title | |
st.markdown("<h1 style='color: #1a237e;'>Understanding K-Nearest Neighbors (KNN)</h1>", unsafe_allow_html=True) | |
# Introduction to KNN | |
st.write(""" | |
K-Nearest Neighbors (KNN) is a fundamental machine learning method suitable for both **classification** and **regression** problems. It makes predictions by analyzing the `K` closest data points in the training set. | |
Key features: | |
- KNN is a non-parametric model. | |
- It memorizes training data instead of learning a model. | |
- Distance metrics like **Euclidean** help determine similarity between data points. | |
""") | |
# How KNN Works | |
st.markdown("<h2 style='color: #1a237e;'>How KNN Functions</h2>", unsafe_allow_html=True) | |
st.subheader("Training Phase") | |
st.write(""" | |
- KNN doesn't train a model in the traditional sense. | |
- It stores the dataset and uses it during prediction. | |
""") | |
st.subheader("Prediction - Classification") | |
st.write(""" | |
1. Set the value of `k`. | |
2. Calculate the distance between the input and each point in the training data. | |
3. Identify the `k` nearest neighbors. | |
4. Use majority voting to assign the class label. | |
""") | |
st.subheader("Prediction - Regression") | |
st.write(""" | |
1. Choose `k`. | |
2. Find the distances to all training points. | |
3. Pick the closest `k` neighbors. | |
4. Predict using the **average** or **weighted average** of their values. | |
""") | |
# Overfitting and Underfitting | |
st.subheader("Model Behavior") | |
st.write(""" | |
- **Overfitting**: Occurs when the model captures noise by using very low values of `k`. | |
- **Underfitting**: Happens when the model oversimplifies, often with high `k` values. | |
- **Optimal Fit**: Found by balancing both, often using cross-validation. | |
""") | |
# Training vs CV Error | |
st.subheader("Error Analysis") | |
st.write(""" | |
- **Training Error**: Error on the dataset used for fitting. | |
- **Cross-Validation Error**: Error on separate validation data. | |
- Ideal models show low error in both. | |
""") | |
# Hyperparameter Tuning | |
st.subheader("Hyperparameter Choices") | |
st.write(""" | |
Important tuning options for KNN include: | |
- `k`: Number of neighbors | |
- `weights`: `uniform` or `distance` | |
- `metric`: Distance formula like Euclidean or Manhattan | |
- `n_jobs`: Parallel processing support | |
""") | |
# Scaling | |
st.subheader("Why Scaling is Crucial") | |
st.write(""" | |
KNN relies heavily on distances, so it's essential to scale features. Use: | |
- **Min-Max Normalization** to compress values between 0 and 1. | |
- **Z-score Standardization** to center data. | |
Always scale training and testing data separately. | |
""") | |
# Weighted KNN | |
st.subheader("Weighted KNN") | |
st.write(""" | |
In Weighted KNN, closer neighbors have more influence on the result. It improves accuracy, especially in noisy or uneven data. | |
""") | |
# Decision Regions | |
st.subheader("Decision Boundaries") | |
st.write(""" | |
KNN creates boundaries based on training data: | |
- Small `k` = complex, sensitive regions (risk of overfitting). | |
- Large `k` = smoother regions (risk of underfitting). | |
""") | |
# Cross Validation | |
st.subheader("Cross-Validation") | |
st.write(""" | |
Cross-validation helps evaluate models effectively. For example: | |
- **K-Fold CV** divides data into parts and tests each part. | |
- Ensures model generalization. | |
""") | |
# Hyperparameter Optimization Techniques | |
st.subheader("Tuning Methods") | |
st.write(""" | |
- **Grid Search**: Tests all combinations of parameters. | |
- **Random Search**: Picks random combinations for faster tuning. | |
- **Bayesian Search**: Uses previous results to make better guesses on parameter selection. | |
""") | |
# Notebook Link | |
st.markdown("<h2 style='color: #1a237e;'>KNN Implementation Notebook</h2>", unsafe_allow_html=True) | |
st.markdown( | |
"<a href='https://colab.research.google.com/drive/11wk6wt7sZImXhTqzYrre3ic4oj3KFC4M?usp=sharing' target='_blank' style='font-size: 16px; color: #1a237e;'>Click here to open the Colab notebook</a>", | |
unsafe_allow_html=True | |
) | |
st.write(""" | |
KNN is intuitive and effective when combined with proper preprocessing and hyperparameter tuning. Use cross-validation to find the sweet spot and avoid overfitting or underfitting. | |
""") | |