Sathwikchowdary commited on
Commit
e6e1c1e
·
verified ·
1 Parent(s): 6d250ee

Rename pages/KNN Alogrithm.py to pages/15KNN Alogrithm.py

Browse files
Files changed (2) hide show
  1. pages/15KNN Alogrithm.py +137 -0
  2. pages/KNN Alogrithm.py +0 -0
pages/15KNN Alogrithm.py ADDED
@@ -0,0 +1,137 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+
3
+ # Page configuration
4
+ st.set_page_config(page_title="KNN Overview", page_icon="📊", layout="wide")
5
+
6
+ # Custom CSS styling for a cleaner, light-colored interface
7
+ st.markdown("""
8
+ <style>
9
+ .stApp {
10
+ background-color: #f2f6fa;
11
+ }
12
+ h1, h2, h3 {
13
+ color: #1a237e;
14
+ }
15
+ .custom-font, p {
16
+ font-family: 'Arial', sans-serif;
17
+ font-size: 18px;
18
+ color: #212121;
19
+ line-height: 1.6;
20
+ }
21
+ </style>
22
+ """, unsafe_allow_html=True)
23
+
24
+ # Title
25
+ st.markdown("<h1 style='color: #1a237e;'>Understanding K-Nearest Neighbors (KNN)</h1>", unsafe_allow_html=True)
26
+
27
+ # Introduction to KNN
28
+ st.write("""
29
+ K-Nearest Neighbors (KNN) is a fundamental machine learning method suitable for both **classification** and **regression** problems. It makes predictions by analyzing the `K` closest data points in the training set.
30
+
31
+ Key features:
32
+ - KNN is a non-parametric model.
33
+ - It memorizes training data instead of learning a model.
34
+ - Distance metrics like **Euclidean** help determine similarity between data points.
35
+ """)
36
+
37
+ # How KNN Works
38
+ st.markdown("<h2 style='color: #1a237e;'>How KNN Functions</h2>", unsafe_allow_html=True)
39
+
40
+ st.subheader("Training Phase")
41
+ st.write("""
42
+ - KNN doesn't train a model in the traditional sense.
43
+ - It stores the dataset and uses it during prediction.
44
+ """)
45
+
46
+ st.subheader("Prediction - Classification")
47
+ st.write("""
48
+ 1. Set the value of `k`.
49
+ 2. Calculate the distance between the input and each point in the training data.
50
+ 3. Identify the `k` nearest neighbors.
51
+ 4. Use majority voting to assign the class label.
52
+ """)
53
+
54
+ st.subheader("Prediction - Regression")
55
+ st.write("""
56
+ 1. Choose `k`.
57
+ 2. Find the distances to all training points.
58
+ 3. Pick the closest `k` neighbors.
59
+ 4. Predict using the **average** or **weighted average** of their values.
60
+ """)
61
+
62
+ # Overfitting and Underfitting
63
+ st.subheader("Model Behavior")
64
+ st.write("""
65
+ - **Overfitting**: Occurs when the model captures noise by using very low values of `k`.
66
+ - **Underfitting**: Happens when the model oversimplifies, often with high `k` values.
67
+ - **Optimal Fit**: Found by balancing both, often using cross-validation.
68
+ """)
69
+
70
+ # Training vs CV Error
71
+ st.subheader("Error Analysis")
72
+ st.write("""
73
+ - **Training Error**: Error on the dataset used for fitting.
74
+ - **Cross-Validation Error**: Error on separate validation data.
75
+ - Ideal models show low error in both.
76
+ """)
77
+
78
+ # Hyperparameter Tuning
79
+ st.subheader("Hyperparameter Choices")
80
+ st.write("""
81
+ Important tuning options for KNN include:
82
+ - `k`: Number of neighbors
83
+ - `weights`: `uniform` or `distance`
84
+ - `metric`: Distance formula like Euclidean or Manhattan
85
+ - `n_jobs`: Parallel processing support
86
+ """)
87
+
88
+ # Scaling
89
+ st.subheader("Why Scaling is Crucial")
90
+ st.write("""
91
+ KNN relies heavily on distances, so it's essential to scale features. Use:
92
+ - **Min-Max Normalization** to compress values between 0 and 1.
93
+ - **Z-score Standardization** to center data.
94
+
95
+ Always scale training and testing data separately.
96
+ """)
97
+
98
+ # Weighted KNN
99
+ st.subheader("Weighted KNN")
100
+ st.write("""
101
+ In Weighted KNN, closer neighbors have more influence on the result. It improves accuracy, especially in noisy or uneven data.
102
+ """)
103
+
104
+ # Decision Regions
105
+ st.subheader("Decision Boundaries")
106
+ st.write("""
107
+ KNN creates boundaries based on training data:
108
+ - Small `k` = complex, sensitive regions (risk of overfitting).
109
+ - Large `k` = smoother regions (risk of underfitting).
110
+ """)
111
+
112
+ # Cross Validation
113
+ st.subheader("Cross-Validation")
114
+ st.write("""
115
+ Cross-validation helps evaluate models effectively. For example:
116
+ - **K-Fold CV** divides data into parts and tests each part.
117
+ - Ensures model generalization.
118
+ """)
119
+
120
+ # Hyperparameter Optimization Techniques
121
+ st.subheader("Tuning Methods")
122
+ st.write("""
123
+ - **Grid Search**: Tests all combinations of parameters.
124
+ - **Random Search**: Picks random combinations for faster tuning.
125
+ - **Bayesian Search**: Uses previous results to make better guesses on parameter selection.
126
+ """)
127
+
128
+ # Notebook Link
129
+ st.markdown("<h2 style='color: #1a237e;'>KNN Implementation Notebook</h2>", unsafe_allow_html=True)
130
+ st.markdown(
131
+ "<a href='https://colab.research.google.com/drive/11wk6wt7sZImXhTqzYrre3ic4oj3KFC4M?usp=sharing' target='_blank' style='font-size: 16px; color: #1a237e;'>Click here to open the Colab notebook</a>",
132
+ unsafe_allow_html=True
133
+ )
134
+
135
+ st.write("""
136
+ KNN is intuitive and effective when combined with proper preprocessing and hyperparameter tuning. Use cross-validation to find the sweet spot and avoid overfitting or underfitting.
137
+ """)
pages/KNN Alogrithm.py DELETED
File without changes