dcarpintero
/

fashion-mnist-base

Image Classification

Model card Files Files and versions Community

fashion-mnist-base / README.md

dcarpintero's picture

Update README.md

92b8d63 verified over 1 year ago

|

history blame contribute delete

3.54 kB

	---
	license: apache-2.0
	datasets:
	- fashion_mnist
	language:
	- en
	metrics:
	- accuracy
	pipeline_tag: image-classification
	---

	# Fashion-MNIST Baseline Classifier

	## Model Details

	- Model Name: fashion-mnist-base
	- Framework: Custom implementation in Python
	- Version: 0.1
	- License: Apache-2.0

	## Model Description

	This is a neural network model developed from the ground up to classify images from the Fashion-MNIST dataset.
	The dataset comprises 70,000 grayscale images across 10 categories. Each example is a 28x28 grayscale image,
	associated with a label from 10 classes including T-shirts/tops, trousers, pullovers, dresses, coats, sandals, shirts, sneakers, bags, and ankle boots.

	## Intended Use

	This model is intended for educational purposes and as a baseline for more complex implementations. It can be used by students and AI enthusiasts
	to understand the workings of neural networks and their application in image classification.

	## Training Data

	The model was trained on the Fashion-MNIST dataset, which contains 60,000 training images and 10,000 test images.
	Each image is 28x28 pixels, grayscale, associated with one of 10 classes representing different types of clothing and accessories.

	### Architecture Details:
	- Input layer: 784 neurons (flattened 28x28 image)
	- Hidden layer 1: 256 neurons, ReLU activation, Dropout
	- Hidden layer 2: 64 neurons, ReLU activation, Dropout
	- Output layer: 10 neurons, logits

	### Hyperparameters:
	- Learning rate: 0.005
	- Batch size: 32
	- Epochs: 25

	The model uses a self-implemented stochastic gradient descent (SGD) optimizer.

	## Evaluation Results

	The model achieved the following performance on the test set:
	- Accuracy: 86.7%
	- Precision, Recall, and F1-Score:

	\| Label \| Precision \| Recall \| F1-score \|
	\|-------------\|-----------\|---------\|----------\|
	\| T-shirt/Top \| 0.847514 \| 0.767 \| 0.805249 \|
	\| Trouser \| 0.982618 \| 0.961 \| 0.971689 \|
	\| Pullover \| 0.800000 \| 0.748 \| 0.773127 \|
	\| Dress \| 0.861868 \| 0.886 \| 0.873767 \|
	\| Coat \| 0.776278 \| 0.805 \| 0.790378 \|
	\| Sandal \| 0.957958 \| 0.957 \| 0.957479 \|
	\| Shirt \| 0.638587 \| 0.705 \| 0.670152 \|
	\| Sneaker \| 0.935743 \| 0.932 \| 0.933868 \|
	\| Bag \| 0.952381 \| 0.960 \| 0.956175 \|
	\| Ankle-Boot \| 0.944554 \| 0.954 \| 0.949254 \|

	## Limitations and Biases

	Due to the nature of the training dataset, the model may not capture the full complexity of fashion items in diverse real-world scenarios.
	In practice, we found out that it is sensitive to background colors and article's proportions.

	## How to Use

	```python
	import torch
	import torchvision.transforms as transforms
	from PIL import Image

	model = torch.load('fashion-mnist-base.pt')

	# Images need to be transformed to the `fashion MNIST` dataset format
	transform = transforms.Compose(
	[
	transforms.Resize((28, 28)),
	transforms.Grayscale(),
	transforms.ToTensor(),
	transforms.Normalize((0.5,), (0.5,)), # Normalization
	transforms.Lambda(lambda x: 1.0 - x), # Invert colors
	transforms.Lambda(lambda x: x[0]),
	transforms.Lambda(lambda x: x.unsqueeze(0)),
	]
	)

	img = Image.open('fashion/dress.png')
	img = transform(img)
	model.predictions(img)
	```

	## Sample Output

	```
	{'Dress': 84.437744,
	'Coat': 7.631796,
	'Pullover': 4.2272186,
	'Shirt': 1.297625,
	'T-shirt/Top': 1.2237197,
	'Bag': 0.9053432,
	'Trouser/Jeans': 0.27268794,
	'Sneaker': 0.0031491981,
	'Ankle-Boot': 0.00063403655,
	'Sandal': 8.5103806e-05}
	```