Kemsekov
/

vqvae2-pixelart

Model card Files Files and versions Community

Kemsekov commited on 27 days ago

Commit

541ccc4

·

verified ·

1 Parent(s): 0a7cbdb

Update README.md

Files changed (1) hide show

README.md +104 -3

README.md CHANGED Viewed

@@ -1,3 +1,104 @@
----
-license: mit
----

+---
+license: mit
+---
+Custom hand-made 3-scale VQVAE trained on private dataset that consists of about 4k images pixelart images.
+Source code for model can be found [here](https://github.com/Kemsekov/kemsekov_torch/tree/main/vqvae).
+It acrhived 0.987 r2 metric on image reconstruction.
+Model have codebook:
+* 512 bottom
+* 512 mid
+* 256 top
+This provides enough space for model to achieve good metrics.
+Here is code example how to use it.
+```py
+import random
+import PIL.Image
+from matplotlib import pyplot as plt
+import torch
+import torchvision.transforms as T
+sample = PIL.Image.open("sample_images/cat.png") # you sample image
+sample = T.ToTensor()(sample)[None,:] # add batch dimension
+sample = T.Resize((512,512))(sample) # optional, this vqvae works fine with any input image size
+vqvae=torch.jit.load("model.pt")
+# rec is reconstruction
+# z is list of latent space tensors
+# z_q is quantized list of latent space tensors
+# ind is list of encoded indices of quantized elements in latent space
+rec, z, z_q,ind = vqvae.eval().cpu()(sample)
+rec_ind = vqvae.decode_from_ind(ind)
+rec=rec.sigmoid()
+rec_ind=rec_ind.sigmoid()
+print("Original image shape",list(sample.shape[1:]))
+print("ind shapes",[list(v.shape[1:]) for v in ind])
+plt.figure(figsize=(18,6))
+plt.subplot(1,3,1)
+plt.imshow(T.ToPILImage()(sample[0]).resize((256,256)))
+plt.title("original")
+plt.axis('off')
+plt.subplot(1,3,2)
+plt.imshow(T.ToPILImage()(rec[0]).resize((256,256)))
+plt.title("reconstruction")
+plt.axis('off')
+plt.subplot(1,3,3)
+plt.imshow(T.ToPILImage()(rec_ind[0]).resize((256,256)))
+plt.title("reconstruction from ind")
+plt.axis('off')
+plt.show()
+plt.figure(figsize=(18,6))
+plt.subplot(1,3,1)
+plt.imshow(T.ToPILImage()(ind[0]/512).resize((256,256)))
+plt.title("ind0")
+plt.axis('off')
+plt.subplot(1,3,2)
+plt.imshow(T.ToPILImage()(ind[1]/512).resize((256,256)))
+plt.title("ind1")
+plt.axis('off')
+plt.subplot(1,3,3)
+plt.imshow(T.ToPILImage()(ind[2]/256).resize((256,256)))
+plt.title("ind2")
+plt.axis('off')
+plt.show()
+print("latent space render")
+for z_ in z:
+    dims = len(z_[0])
+    dims_sqrt = int(dims**0.5)
+    plt.figure(figsize=(10,10))
+    plt.axis('off')
+    for i in range(dims_sqrt):
+        for j in range(dims_sqrt):
+                slice_ind = i*dims_sqrt+j
+                slice_ind_end = slice_ind+1
+                plt.subplot(dims_sqrt,dims_sqrt,slice_ind+1)
+                plt.imshow(T.ToPILImage()(z_[0][slice_ind:slice_ind_end]))
+                plt.axis('off')
+    plt.show()
+```
+```
+Original image shape [3, 512, 512]
+ind shapes [[128, 128], [64, 64], [32, 32]]
+```
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/633b160acbdbadd99c094172/d3PSfPu9tkKZkdMv8UJSV.png)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/633b160acbdbadd99c094172/pDOPnZtAh05UXfkFaklkq.png)