|
--- |
|
tags: |
|
- autoencoder |
|
- image-colorization |
|
- pytorch |
|
- pytorch_model_hub_mixin |
|
license: apache-2.0 |
|
datasets: |
|
- flwrlabs/celeba |
|
language: |
|
- en |
|
metrics: |
|
- mse |
|
pipeline_tag: image-to-image |
|
--- |
|
|
|
# Model Colorization Autoencoder |
|
|
|
## Model Description |
|
|
|
This autoencoder model is designed for image colorization. It takes grayscale images as input and outputs colorized versions of those images. The model architecture consists of an encoder-decoder structure, where the encoder compresses the input image into a latent representation, and the decoder reconstructs the image in color. |
|
|
|
### Architecture |
|
|
|
- **Encoder**: The encoder comprises three convolutional layers followed by max pooling and ReLU activations, each paired with batch normalization. It ends with a flattening layer and a fully connected layer to produce a latent vector. |
|
- **Decoder**: The decoder mirrors the encoder, using linear and transposed convolutional layers with ReLU activations and batch normalization. The final layer outputs a color image using a sigmoid activation function. |
|
|
|
The architecture details are as follows: |
|
```python |
|
class ModelColorization(nn.Module, PyTorchModelHubMixin): |
|
def __init__(self): |
|
super(ModelColorization, self).__init__() |
|
self.encoder = nn.Sequential( |
|
nn.Conv2d(1, 64, kernel_size=3, stride=1, padding=1), |
|
nn.MaxPool2d(kernel_size=2, stride=2), |
|
nn.ReLU(), |
|
nn.BatchNorm2d(64), |
|
nn.Conv2d(64, 32, kernel_size=3, stride=1, padding=1), |
|
nn.MaxPool2d(kernel_size=2, stride=2), |
|
nn.ReLU(), |
|
nn.BatchNorm2d(32), |
|
nn.Conv2d(32, 16, kernel_size=3, stride=1, padding=1), |
|
nn.MaxPool2d(kernel_size=2, stride=2), |
|
nn.ReLU(), |
|
nn.BatchNorm2d(16), |
|
nn.Flatten(), |
|
nn.Linear(16*45*45, 4000), |
|
) |
|
self.decoder = nn.Sequential( |
|
nn.Linear(4000, 16 * 45 * 45), |
|
nn.ReLU(), |
|
nn.Unflatten(1, (16, 45, 45)), |
|
nn.ConvTranspose2d(16, 32, kernel_size=3, stride=2, padding=1, output_padding=1), |
|
nn.ReLU(), |
|
nn.BatchNorm2d(32), |
|
nn.ConvTranspose2d(32, 64, kernel_size=3, stride=2, padding=1, output_padding=1), |
|
nn.ReLU(), |
|
nn.BatchNorm2d(64), |
|
nn.ConvTranspose2d(64, 3, kernel_size=3, stride=2, padding=1, output_padding=1), |
|
nn.Sigmoid() |
|
) |
|
|
|
def forward(self, x): |
|
x = self.encoder(x) |
|
x = self.decoder(x) |
|
return x |
|
|
|
``` |
|
|
|
### Training Details |
|
The model was trained using PyTorch for 5 epochs. Here are the training and validation losses observed during the training: |
|
|
|
Epoch 1: Training Loss: 0.0063, Validation Loss: 0.0042 |
|
|
|
Epoch 2: Training Loss: 0.0036, Validation Loss: 0.0035 |
|
|
|
Epoch 3: Training Loss: 0.0032, Validation Loss: 0.0032 |
|
|
|
Epoch 4: Training Loss: 0.0030, Validation Loss: 0.0030 |
|
|
|
Epoch 5: Training Loss: 0.0029, Validation Loss: 0.0030 |
|
|
|
The model demonstrated continuous improvement in reducing both training and validation loss over the epochs. |
|
|
|
### Usage |
|
You can load the model from the Hugging Face Hub using the following code: |
|
|
|
```python |
|
# Ensure you have the necessary dependencies installed: |
|
pip install torch torchvision transformers |
|
|
|
from transformers import AutoModel |
|
|
|
model = AutoModel.from_pretrained("sebastiansarasti/AutoEncoderImageColorization") |
|
``` |