Rudra Rahul Chothe commited on
Commit
38a1194
·
verified ·
1 Parent(s): c695c19

Upload folder using huggingface_hub

Browse files
Files changed (4) hide show
  1. .gitignore +92 -93
  2. README.md +117 -133
  3. app.py +50 -50
  4. requirements.txt +7 -10
.gitignore CHANGED
@@ -1,94 +1,93 @@
1
- # Python
2
- __pycache__/
3
- *.py[cod]
4
- *$py.class
5
- *.so
6
- .Python
7
- build/
8
- develop-eggs/
9
- dist/
10
- downloads/
11
- eggs/
12
- .eggs/
13
- lib/
14
- lib64/
15
- parts/
16
- sdist/
17
- var/
18
- wheels/
19
- *.egg-info/
20
- .installed.cfg
21
- *.egg
22
-
23
- # Virtual Environment
24
- venv/
25
- env/
26
- ENV/
27
- .env
28
- .venv
29
- env.bak/
30
- venv.bak/
31
-
32
- # IDE specific files
33
- .idea/
34
- .vscode/
35
- *.swp
36
- *.swo
37
- .project
38
- .pydevproject
39
- .settings/
40
-
41
- # Project specific
42
- # data/
43
- # *.pkl
44
- # temp_query_image.jpg
45
- # embeddings.pkl
46
- *.h5
47
- src/__init__.py
48
- models/
49
- temp/
50
- logs/
51
-
52
- # OS specific
53
- .DS_Store
54
- Thumbs.db
55
- *.db
56
- *.sqlite3
57
-
58
- # Jupyter Notebook
59
- .ipynb_checkpoints
60
- *.ipynb
61
-
62
- # Distribution / packaging
63
- .Python
64
- *.manifest
65
- *.spec
66
- pip-log.txt
67
- pip-delete-this-directory.txt
68
-
69
- # Unit test / coverage reports
70
- htmlcov/
71
- .tox/
72
- .coverage
73
- .coverage.*
74
- .cache
75
- nosetests.xml
76
- coverage.xml
77
- *.cover
78
- .hypothesis/
79
- .pytest_cache/
80
-
81
- # Logs
82
- *.log
83
- local_settings.py
84
- db.sqlite3
85
-
86
- # Environment variables
87
- .env
88
- .env.local
89
- .env.*.local
90
-
91
- # Docker
92
- Dockerfile
93
- docker-compose.yml
94
  .docker/
 
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ .Python
7
+ build/
8
+ develop-eggs/
9
+ dist/
10
+ downloads/
11
+ eggs/
12
+ .eggs/
13
+ lib/
14
+ lib64/
15
+ parts/
16
+ sdist/
17
+ var/
18
+ wheels/
19
+ *.egg-info/
20
+ .installed.cfg
21
+ *.egg
22
+
23
+ # Virtual Environment
24
+ venv/
25
+ env/
26
+ ENV/
27
+ .env
28
+ .venv
29
+ env.bak/
30
+ venv.bak/
31
+
32
+ # IDE specific files
33
+ .idea/
34
+ .vscode/
35
+ *.swp
36
+ *.swo
37
+ .project
38
+ .pydevproject
39
+ .settings/
40
+
41
+ # Project specific
42
+ # data/
43
+ # *.pkl
44
+ # temp_query_image.jpg
45
+ # embeddings.pkl
46
+ *.h5
47
+ models/
48
+ temp/
49
+ logs/
50
+
51
+ # OS specific
52
+ .DS_Store
53
+ Thumbs.db
54
+ *.db
55
+ *.sqlite3
56
+
57
+ # Jupyter Notebook
58
+ .ipynb_checkpoints
59
+ *.ipynb
60
+
61
+ # Distribution / packaging
62
+ .Python
63
+ *.manifest
64
+ *.spec
65
+ pip-log.txt
66
+ pip-delete-this-directory.txt
67
+
68
+ # Unit test / coverage reports
69
+ htmlcov/
70
+ .tox/
71
+ .coverage
72
+ .coverage.*
73
+ .cache
74
+ nosetests.xml
75
+ coverage.xml
76
+ *.cover
77
+ .hypothesis/
78
+ .pytest_cache/
79
+
80
+ # Logs
81
+ *.log
82
+ local_settings.py
83
+ db.sqlite3
84
+
85
+ # Environment variables
86
+ .env
87
+ .env.local
88
+ .env.*.local
89
+
90
+ # Docker
91
+ Dockerfile
92
+ docker-compose.yml
 
93
  .docker/
README.md CHANGED
@@ -1,133 +1,117 @@
1
- ---
2
- language: en
3
- license: mit
4
- tags:
5
- - image-search
6
- - machine-learning
7
- title: Image Search Engine Fashion
8
- sdk: streamlit
9
- emoji: 💻
10
- colorFrom: blue
11
- colorTo: pink
12
- ---
13
-
14
- ---
15
- title: Image Search Engine Fashion
16
- emoji: 🔍
17
- colorFrom: blue
18
- colorTo: pink
19
- sdk: streamlit
20
- sdk_version: 1.27.2
21
- app_file: app.py
22
- pinned: false
23
- ---
24
-
25
- ## Image Similarity Search Engine
26
- A deep learning-based image similarity search engine that uses EfficientNetB0 for feature extraction and FAISS for fast similarity search. The application provides a web interface built with Streamlit for easy interaction.
27
-
28
- Features
29
- - Deep Feature Extraction: Uses EfficientNetB0 (pre-trained on ImageNet) to extract meaningful features from images
30
- - Fast Similarity Search: Implements FAISS for efficient nearest-neighbor search
31
- - Interactive Web Interface: User-friendly interface built with Streamlit
32
- - Real-time Processing: Shows progress and time estimates during feature extraction
33
- - Scalable Architecture: Designed to handle large image datasets efficiently
34
-
35
- ## Installation
36
- ## Prerequisites
37
-
38
- Python 3.8 or higher
39
- pip package manager
40
-
41
- ## Setup
42
-
43
- 1. Clone the repository:
44
- ```
45
- git clone https://github.com/yourusername/image-similarity-search.git
46
- cd image-similarity-search
47
- ```
48
- 2. Create and activate a virtual environment:
49
- ```
50
- python -m venv venv
51
- source venv/bin/activate # On Windows use: venv\Scripts\activate
52
- ```
53
- 3. Install required packages:
54
- ```
55
- pip install -r requirements.txt
56
- ```
57
-
58
- ## Project Structure
59
- ```
60
- image-similarity-search/
61
- ├── data/
62
- │ ├── images/ # Directory for train dataset images
63
- │ ├── sample-test-images/ # Directory for test dataset images
64
- │ └── embeddings.pkl # Pre-computed image embeddings
65
- ├── src/
66
- │ ├── feature_extractor.py # EfficientNetB0 feature extraction
67
- │ ├── preprocessing.py # Image preprocessing and embedding computation
68
- │ ├── similarity_search.py # FAISS-based similarity search
69
- │ └── main.py # Streamlit web interface
70
- ├── requirements.txt
71
- ├── README.md
72
- └── .gitignore
73
- ```
74
- ## Usage
75
-
76
- 1. **Prepare Your Dataset:**
77
- Get training image dataset from drive:
78
- ```
79
- https://drive.google.com/file/d/1U2PljA7NE57jcSSzPs21ZurdIPXdYZtN/view?usp=drive_link
80
- ```
81
- Place your image dataset in the data/images directory
82
- Supported formats: JPG, JPEG, PNG
83
-
84
- 2. **Generate Embeddings:**
85
- ```
86
- python -m src.preprocessing
87
- ```
88
-
89
- **This will**:
90
- - Process all images in the dataset
91
- - Show progress and time estimates
92
- - Save embeddings to data/embeddings.pkl
93
-
94
- 3. **Run the Web Interface:**
95
- ```
96
- streamlit run src/main.py
97
- ```
98
-
99
- 4. Using the Interface:
100
-
101
- - Upload a query image using the file uploader
102
- - Click "Search Similar Images"
103
- - View the most similar images from your dataset
104
-
105
-
106
-
107
- ## Technical Details
108
- **Feature Extraction**
109
- - Uses EfficientNetB0 without top layers
110
- - Input image size: 224x224 pixels
111
- - Output feature dimension: 1280
112
-
113
- **Similarity Search**
114
- - Uses FAISS IndexFlatL2 for L2 distance-based search
115
- - Returns top-k most similar images (default k=5)
116
-
117
- **Web Interface**
118
- - Responsive design with Streamlit
119
- - Displays original and similar images with similarity scores
120
- - Progress tracking during processing
121
-
122
- **Dependencies**
123
- - TensorFlow 2.x
124
- - FAISS-cpu (or FAISS-gpu for GPU support)
125
- - Streamlit
126
- - Pillow
127
- - NumPy
128
- - tqdm
129
-
130
- **Performance**
131
- - Feature extraction: ~1 second per image on CPU
132
- - Similarity search: Near real-time for datasets up to 100k images
133
- - Memory usage depends on dataset size (approximately 5KB per image embedding)
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ tags:
5
+ - image-search
6
+ - machine-learning
7
+ ---
8
+
9
+ ## Image Similarity Search Engine
10
+ A deep learning-based image similarity search engine that uses EfficientNetB0 for feature extraction and FAISS for fast similarity search. The application provides a web interface built with Streamlit for easy interaction.
11
+
12
+ Features
13
+ - Deep Feature Extraction: Uses EfficientNetB0 (pre-trained on ImageNet) to extract meaningful features from images
14
+ - Fast Similarity Search: Implements FAISS for efficient nearest-neighbor search
15
+ - Interactive Web Interface: User-friendly interface built with Streamlit
16
+ - Real-time Processing: Shows progress and time estimates during feature extraction
17
+ - Scalable Architecture: Designed to handle large image datasets efficiently
18
+
19
+ ## Installation
20
+ ## Prerequisites
21
+
22
+ Python 3.8 or higher
23
+ pip package manager
24
+
25
+ ## Setup
26
+
27
+ 1. Clone the repository:
28
+ ```
29
+ git clone https://github.com/yourusername/image-similarity-search.git
30
+ cd image-similarity-search
31
+ ```
32
+ 2. Create and activate a virtual environment:
33
+ ```
34
+ python -m venv venv
35
+ source venv/bin/activate # On Windows use: venv\Scripts\activate
36
+ ```
37
+ 3. Install required packages:
38
+ ```
39
+ pip install -r requirements.txt
40
+ ```
41
+
42
+ ## Project Structure
43
+ ```
44
+ image-similarity-search/
45
+ ├── data/
46
+ │ ├── images/ # Directory for train dataset images
47
+ │ ├── sample-test-images/ # Directory for test dataset images
48
+ │ └── embeddings.pkl # Pre-computed image embeddings
49
+ ├── src/
50
+ │ ├── feature_extractor.py # EfficientNetB0 feature extraction
51
+ │ ├── preprocessing.py # Image preprocessing and embedding computation
52
+ │ ├── similarity_search.py # FAISS-based similarity search
53
+ │ └── main.py # Streamlit web interface
54
+ ├── requirements.txt
55
+ ├── README.md
56
+ └── .gitignore
57
+ ```
58
+ ## Usage
59
+
60
+ 1. **Prepare Your Dataset:**
61
+ Get training image dataset from drive:
62
+ ```
63
+ https://drive.google.com/file/d/1U2PljA7NE57jcSSzPs21ZurdIPXdYZtN/view?usp=drive_link
64
+ ```
65
+ Place your image dataset in the data/images directory
66
+ Supported formats: JPG, JPEG, PNG
67
+
68
+ 2. **Generate Embeddings:**
69
+ ```
70
+ python -m src.preprocessing
71
+ ```
72
+
73
+ **This will**:
74
+ - Process all images in the dataset
75
+ - Show progress and time estimates
76
+ - Save embeddings to data/embeddings.pkl
77
+
78
+ 3. **Run the Web Interface:**
79
+ ```
80
+ streamlit run src/main.py
81
+ ```
82
+
83
+ 4. Using the Interface:
84
+
85
+ - Upload a query image using the file uploader
86
+ - Click "Search Similar Images"
87
+ - View the most similar images from your dataset
88
+
89
+
90
+
91
+ ## Technical Details
92
+ **Feature Extraction**
93
+ - Uses EfficientNetB0 without top layers
94
+ - Input image size: 224x224 pixels
95
+ - Output feature dimension: 1280
96
+
97
+ **Similarity Search**
98
+ - Uses FAISS IndexFlatL2 for L2 distance-based search
99
+ - Returns top-k most similar images (default k=5)
100
+
101
+ **Web Interface**
102
+ - Responsive design with Streamlit
103
+ - Displays original and similar images with similarity scores
104
+ - Progress tracking during processing
105
+
106
+ **Dependencies**
107
+ - TensorFlow 2.x
108
+ - FAISS-cpu (or FAISS-gpu for GPU support)
109
+ - Streamlit
110
+ - Pillow
111
+ - NumPy
112
+ - tqdm
113
+
114
+ **Performance**
115
+ - Feature extraction: ~1 second per image on CPU
116
+ - Similarity search: Near real-time for datasets up to 100k images
117
+ - Memory usage depends on dataset size (approximately 5KB per image embedding)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app.py CHANGED
@@ -1,50 +1,50 @@
1
- import streamlit as st
2
- from PIL import Image
3
- from src.feature_extractor import FeatureExtractor
4
- from src.similarity_search import SimilaritySearchEngine
5
-
6
- def main():
7
- st.title('Image Similarity Search')
8
-
9
- # Upload query image
10
- uploaded_file = st.file_uploader("Choose an image...", type=["jpg", "png", "jpeg"])
11
-
12
- if uploaded_file is not None:
13
- # Load the uploaded image
14
- query_img = Image.open(uploaded_file)
15
-
16
- # Resize and display the query image
17
- query_img_resized = query_img.resize((263, 385))
18
- st.image(query_img_resized, caption='Uploaded Image', use_container_width=False)
19
-
20
- # Feature extraction and similarity search
21
- if st.button("Search Similar Images"):
22
- with st.spinner("Analyzing query image..."):
23
- try:
24
- # Initialize feature extractor and search engine
25
- extractor = FeatureExtractor()
26
- search_engine = SimilaritySearchEngine()
27
-
28
- # Save the uploaded image temporarily
29
- query_img_path = 'temp_query_image.jpg'
30
- query_img.save(query_img_path)
31
-
32
- # Extract features from the query image
33
- query_embedding = extractor.extract_features(query_img_path)
34
-
35
- # Perform similarity search
36
- similar_images, distances = search_engine.search_similar_images(query_embedding)
37
-
38
- # Display similar images
39
- st.subheader('Similar Images')
40
- cols = st.columns(len(similar_images))
41
- for i, (img_path, dist) in enumerate(zip(similar_images, distances)):
42
- with cols[i]:
43
- similar_img = Image.open(img_path).resize((375, 550))
44
- st.image(similar_img, caption=f'Distance: {dist:.2f}', use_container_width=True)
45
-
46
- except Exception as e:
47
- st.error(f"Error during similarity search: {e}")
48
-
49
- if __name__ == '__main__':
50
- main()
 
1
+ import streamlit as st
2
+ from PIL import Image
3
+ from src.feature_extractor import FeatureExtractor
4
+ from src.similarity_search import SimilaritySearchEngine
5
+
6
+ def main():
7
+ st.title('Image Similarity Search')
8
+
9
+ # Upload query image
10
+ uploaded_file = st.file_uploader("Choose an image...", type=["jpg", "png", "jpeg"])
11
+
12
+ if uploaded_file is not None:
13
+ # Load the uploaded image
14
+ query_img = Image.open(uploaded_file)
15
+
16
+ # Resize and display the query image
17
+ query_img_resized = query_img.resize((263, 385))
18
+ st.image(query_img_resized, caption='Uploaded Image', use_container_width=False)
19
+
20
+ # Feature extraction and similarity search
21
+ if st.button("Search Similar Images"):
22
+ with st.spinner("Analyzing query image..."):
23
+ try:
24
+ # Initialize feature extractor and search engine
25
+ extractor = FeatureExtractor()
26
+ search_engine = SimilaritySearchEngine()
27
+
28
+ # Save the uploaded image temporarily
29
+ query_img_path = 'temp_query_image.jpg'
30
+ query_img.save(query_img_path)
31
+
32
+ # Extract features from the query image
33
+ query_embedding = extractor.extract_features(query_img_path)
34
+
35
+ # Perform similarity search
36
+ similar_images, distances = search_engine.search_similar_images(query_embedding)
37
+
38
+ # Display similar images
39
+ st.subheader('Similar Images')
40
+ cols = st.columns(len(similar_images))
41
+ for i, (img_path, dist) in enumerate(zip(similar_images, distances)):
42
+ with cols[i]:
43
+ similar_img = Image.open(img_path).resize((375, 550))
44
+ st.image(similar_img, caption=f'Distance: {dist:.2f}', use_container_width=True)
45
+
46
+ except Exception as e:
47
+ st.error(f"Error during similarity search: {e}")
48
+
49
+ if __name__ == '__main__':
50
+ main()
requirements.txt CHANGED
@@ -1,10 +1,7 @@
1
- tensorflow
2
- numpy
3
- opencv-python
4
- scikit-learn
5
- streamlit
6
- Pillow
7
- faiss-cpu
8
- python-dotenv
9
- matplotlib
10
- pandas
 
1
+ tensorflow
2
+ numpy
3
+ opencv-python
4
+ scikit-learn
5
+ streamlit
6
+ Pillow
7
+ faiss-cpu