Spaces:

Sathwikchowdary
/

Zero_to_Hero_ML

Sleeping

App Files Files Community

Zero_to_Hero_ML / pages /15KNN Alogrithm.py

Sathwikchowdary

Rename pages/KNN Alogrithm.py to pages/15KNN Alogrithm.py

e6e1c1e verified 3 months ago

raw

history blame

4.56 kB

	import streamlit as st

	# Page configuration
	st.set_page_config(page_title="KNN Overview", page_icon="📊", layout="wide")

	# Custom CSS styling for a cleaner, light-colored interface
	st.markdown("""
	<style>
	.stApp {
	background-color: #f2f6fa;
	}
	h1, h2, h3 {
	color: #1a237e;
	}
	.custom-font, p {
	font-family: 'Arial', sans-serif;
	font-size: 18px;
	color: #212121;
	line-height: 1.6;
	}
	</style>
	""", unsafe_allow_html=True)

	# Title
	st.markdown("<h1 style='color: #1a237e;'>Understanding K-Nearest Neighbors (KNN)</h1>", unsafe_allow_html=True)

	# Introduction to KNN
	st.write("""
	K-Nearest Neighbors (KNN) is a fundamental machine learning method suitable for both classification and regression problems. It makes predictions by analyzing the `K` closest data points in the training set.

	Key features:
	- KNN is a non-parametric model.
	- It memorizes training data instead of learning a model.
	- Distance metrics like Euclidean help determine similarity between data points.
	""")

	# How KNN Works
	st.markdown("<h2 style='color: #1a237e;'>How KNN Functions</h2>", unsafe_allow_html=True)

	st.subheader("Training Phase")
	st.write("""
	- KNN doesn't train a model in the traditional sense.
	- It stores the dataset and uses it during prediction.
	""")

	st.subheader("Prediction - Classification")
	st.write("""
	1. Set the value of `k`.
	2. Calculate the distance between the input and each point in the training data.
	3. Identify the `k` nearest neighbors.
	4. Use majority voting to assign the class label.
	""")

	st.subheader("Prediction - Regression")
	st.write("""
	1. Choose `k`.
	2. Find the distances to all training points.
	3. Pick the closest `k` neighbors.
	4. Predict using the average or weighted average of their values.
	""")

	# Overfitting and Underfitting
	st.subheader("Model Behavior")
	st.write("""
	- Overfitting: Occurs when the model captures noise by using very low values of `k`.
	- Underfitting: Happens when the model oversimplifies, often with high `k` values.
	- Optimal Fit: Found by balancing both, often using cross-validation.
	""")

	# Training vs CV Error
	st.subheader("Error Analysis")
	st.write("""
	- Training Error: Error on the dataset used for fitting.
	- Cross-Validation Error: Error on separate validation data.
	- Ideal models show low error in both.
	""")

	# Hyperparameter Tuning
	st.subheader("Hyperparameter Choices")
	st.write("""
	Important tuning options for KNN include:
	- `k`: Number of neighbors
	- `weights`: `uniform` or `distance`
	- `metric`: Distance formula like Euclidean or Manhattan
	- `n_jobs`: Parallel processing support
	""")

	# Scaling
	st.subheader("Why Scaling is Crucial")
	st.write("""
	KNN relies heavily on distances, so it's essential to scale features. Use:
	- Min-Max Normalization to compress values between 0 and 1.
	- Z-score Standardization to center data.

	Always scale training and testing data separately.
	""")

	# Weighted KNN
	st.subheader("Weighted KNN")
	st.write("""
	In Weighted KNN, closer neighbors have more influence on the result. It improves accuracy, especially in noisy or uneven data.
	""")

	# Decision Regions
	st.subheader("Decision Boundaries")
	st.write("""
	KNN creates boundaries based on training data:
	- Small `k` = complex, sensitive regions (risk of overfitting).
	- Large `k` = smoother regions (risk of underfitting).
	""")

	# Cross Validation
	st.subheader("Cross-Validation")
	st.write("""
	Cross-validation helps evaluate models effectively. For example:
	- K-Fold CV divides data into parts and tests each part.
	- Ensures model generalization.
	""")

	# Hyperparameter Optimization Techniques
	st.subheader("Tuning Methods")
	st.write("""
	- Grid Search: Tests all combinations of parameters.
	- Random Search: Picks random combinations for faster tuning.
	- Bayesian Search: Uses previous results to make better guesses on parameter selection.
	""")

	# Notebook Link
	st.markdown("<h2 style='color: #1a237e;'>KNN Implementation Notebook</h2>", unsafe_allow_html=True)
	st.markdown(
	"<a href='https://colab.research.google.com/drive/11wk6wt7sZImXhTqzYrre3ic4oj3KFC4M?usp=sharing' target='_blank' style='font-size: 16px; color: #1a237e;'>Click here to open the Colab notebook</a>",
	unsafe_allow_html=True
	)

	st.write("""
	KNN is intuitive and effective when combined with proper preprocessing and hyperparameter tuning. Use cross-validation to find the sweet spot and avoid overfitting or underfitting.
	""")