VQ-VAE for MNIST
This is a Vector Quantized Variational Autoencoder (VQ-VAE) trained on the MNIST dataset using PyTorch. The model compresses and reconstructs grayscale handwritten digits and is used as part of an image augmentation and generative modeling pipeline.
π§ Model Details
- Model Type: VQ-VAE
- Dataset: MNIST
- Epochs: 35
- Latent Space: Discrete (quantized vectors)
- Input Size: 64Γ64 (resized and converted to RGB)
- Reconstruction Loss: MSE-based
- Implementation: Custom PyTorch with 3-layer Conv Encoder/Decoder
- FID Score: 53.21
- Loss Curve:
loss_curve.png
This model learns compressed representations of digit images using vector quantization. The reconstructions can be used for augmentation or generative downstream tasks.
π Files
generator.pt
: Trained VQ-VAE model weights.loss_curve.png
: Visual plot of training loss across 35 epochs.fid_score.json
: Stored FrΓ©chet Inception Distance (FID) evaluation result.fid_real/
andfid_fake/
: 1000 real and generated images used for FID computation.
π¦ How to Use
import torch
from models.vqvae.model import VQVAE
model = VQVAE()
model.load_state_dict(torch.load("generator.pt", map_location="cpu"))
model.eval()
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support