sebastiansarasti's picture
Update README.md
9997f1f verified
---
tags:
- autoencoder
- image-colorization
- pytorch
- pytorch_model_hub_mixin
license: apache-2.0
datasets:
- flwrlabs/celeba
language:
- en
metrics:
- mse
pipeline_tag: image-to-image
---
# Model Colorization Autoencoder
## Model Description
This autoencoder model is designed for image colorization. It takes grayscale images as input and outputs colorized versions of those images. The model architecture consists of an encoder-decoder structure, where the encoder compresses the input image into a latent representation, and the decoder reconstructs the image in color.
### Architecture
- **Encoder**: The encoder comprises three convolutional layers followed by max pooling and ReLU activations, each paired with batch normalization. It ends with a flattening layer and a fully connected layer to produce a latent vector.
- **Decoder**: The decoder mirrors the encoder, using linear and transposed convolutional layers with ReLU activations and batch normalization. The final layer outputs a color image using a sigmoid activation function.
The architecture details are as follows:
```python
class ModelColorization(nn.Module, PyTorchModelHubMixin):
def __init__(self):
super(ModelColorization, self).__init__()
self.encoder = nn.Sequential(
nn.Conv2d(1, 64, kernel_size=3, stride=1, padding=1),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.ReLU(),
nn.BatchNorm2d(64),
nn.Conv2d(64, 32, kernel_size=3, stride=1, padding=1),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.ReLU(),
nn.BatchNorm2d(32),
nn.Conv2d(32, 16, kernel_size=3, stride=1, padding=1),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.ReLU(),
nn.BatchNorm2d(16),
nn.Flatten(),
nn.Linear(16*45*45, 4000),
)
self.decoder = nn.Sequential(
nn.Linear(4000, 16 * 45 * 45),
nn.ReLU(),
nn.Unflatten(1, (16, 45, 45)),
nn.ConvTranspose2d(16, 32, kernel_size=3, stride=2, padding=1, output_padding=1),
nn.ReLU(),
nn.BatchNorm2d(32),
nn.ConvTranspose2d(32, 64, kernel_size=3, stride=2, padding=1, output_padding=1),
nn.ReLU(),
nn.BatchNorm2d(64),
nn.ConvTranspose2d(64, 3, kernel_size=3, stride=2, padding=1, output_padding=1),
nn.Sigmoid()
)
def forward(self, x):
x = self.encoder(x)
x = self.decoder(x)
return x
```
### Training Details
The model was trained using PyTorch for 5 epochs. Here are the training and validation losses observed during the training:
Epoch 1: Training Loss: 0.0063, Validation Loss: 0.0042
Epoch 2: Training Loss: 0.0036, Validation Loss: 0.0035
Epoch 3: Training Loss: 0.0032, Validation Loss: 0.0032
Epoch 4: Training Loss: 0.0030, Validation Loss: 0.0030
Epoch 5: Training Loss: 0.0029, Validation Loss: 0.0030
The model demonstrated continuous improvement in reducing both training and validation loss over the epochs.
### Usage
You can load the model from the Hugging Face Hub using the following code:
```python
# Ensure you have the necessary dependencies installed:
pip install torch torchvision transformers
from transformers import AutoModel
model = AutoModel.from_pretrained("sebastiansarasti/AutoEncoderImageColorization")
```