Spaces:
Sleeping
Sleeping
File size: 4,556 Bytes
e6e1c1e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
import streamlit as st
# Page configuration
st.set_page_config(page_title="KNN Overview", page_icon="π", layout="wide")
# Custom CSS styling for a cleaner, light-colored interface
st.markdown("""
<style>
.stApp {
background-color: #f2f6fa;
}
h1, h2, h3 {
color: #1a237e;
}
.custom-font, p {
font-family: 'Arial', sans-serif;
font-size: 18px;
color: #212121;
line-height: 1.6;
}
</style>
""", unsafe_allow_html=True)
# Title
st.markdown("<h1 style='color: #1a237e;'>Understanding K-Nearest Neighbors (KNN)</h1>", unsafe_allow_html=True)
# Introduction to KNN
st.write("""
K-Nearest Neighbors (KNN) is a fundamental machine learning method suitable for both **classification** and **regression** problems. It makes predictions by analyzing the `K` closest data points in the training set.
Key features:
- KNN is a non-parametric model.
- It memorizes training data instead of learning a model.
- Distance metrics like **Euclidean** help determine similarity between data points.
""")
# How KNN Works
st.markdown("<h2 style='color: #1a237e;'>How KNN Functions</h2>", unsafe_allow_html=True)
st.subheader("Training Phase")
st.write("""
- KNN doesn't train a model in the traditional sense.
- It stores the dataset and uses it during prediction.
""")
st.subheader("Prediction - Classification")
st.write("""
1. Set the value of `k`.
2. Calculate the distance between the input and each point in the training data.
3. Identify the `k` nearest neighbors.
4. Use majority voting to assign the class label.
""")
st.subheader("Prediction - Regression")
st.write("""
1. Choose `k`.
2. Find the distances to all training points.
3. Pick the closest `k` neighbors.
4. Predict using the **average** or **weighted average** of their values.
""")
# Overfitting and Underfitting
st.subheader("Model Behavior")
st.write("""
- **Overfitting**: Occurs when the model captures noise by using very low values of `k`.
- **Underfitting**: Happens when the model oversimplifies, often with high `k` values.
- **Optimal Fit**: Found by balancing both, often using cross-validation.
""")
# Training vs CV Error
st.subheader("Error Analysis")
st.write("""
- **Training Error**: Error on the dataset used for fitting.
- **Cross-Validation Error**: Error on separate validation data.
- Ideal models show low error in both.
""")
# Hyperparameter Tuning
st.subheader("Hyperparameter Choices")
st.write("""
Important tuning options for KNN include:
- `k`: Number of neighbors
- `weights`: `uniform` or `distance`
- `metric`: Distance formula like Euclidean or Manhattan
- `n_jobs`: Parallel processing support
""")
# Scaling
st.subheader("Why Scaling is Crucial")
st.write("""
KNN relies heavily on distances, so it's essential to scale features. Use:
- **Min-Max Normalization** to compress values between 0 and 1.
- **Z-score Standardization** to center data.
Always scale training and testing data separately.
""")
# Weighted KNN
st.subheader("Weighted KNN")
st.write("""
In Weighted KNN, closer neighbors have more influence on the result. It improves accuracy, especially in noisy or uneven data.
""")
# Decision Regions
st.subheader("Decision Boundaries")
st.write("""
KNN creates boundaries based on training data:
- Small `k` = complex, sensitive regions (risk of overfitting).
- Large `k` = smoother regions (risk of underfitting).
""")
# Cross Validation
st.subheader("Cross-Validation")
st.write("""
Cross-validation helps evaluate models effectively. For example:
- **K-Fold CV** divides data into parts and tests each part.
- Ensures model generalization.
""")
# Hyperparameter Optimization Techniques
st.subheader("Tuning Methods")
st.write("""
- **Grid Search**: Tests all combinations of parameters.
- **Random Search**: Picks random combinations for faster tuning.
- **Bayesian Search**: Uses previous results to make better guesses on parameter selection.
""")
# Notebook Link
st.markdown("<h2 style='color: #1a237e;'>KNN Implementation Notebook</h2>", unsafe_allow_html=True)
st.markdown(
"<a href='https://colab.research.google.com/drive/11wk6wt7sZImXhTqzYrre3ic4oj3KFC4M?usp=sharing' target='_blank' style='font-size: 16px; color: #1a237e;'>Click here to open the Colab notebook</a>",
unsafe_allow_html=True
)
st.write("""
KNN is intuitive and effective when combined with proper preprocessing and hyperparameter tuning. Use cross-validation to find the sweet spot and avoid overfitting or underfitting.
""")
|