File size: 2,248 Bytes

d94f17c

---
library_name: keras
license: mit
language:
- en
pipeline_tag: image-to-image
tags:
- art
- pixel_art
- character_sprite
- missing_data_imputation
- image_to_image
---

## Model description

The MDIGAN-Characters model was proposed in SBGames 2024 ([paper on ArXiv][paper-arxiv], [page][paper-page] [demo][paper-demo])
It is a model trained for the task of generating characters in a missing pose: for instance,
given images of a character facing back, left, and right, it can generate the character facing front (missing data imputation task).
![](https://i.imgur.com/s5ONl9Q.png)

The model's architecture is based on [CollaGAN][paper-collagan]'s, a model trained to impute images in missing domains 
in a multi-domain scenario. In our case, the domains are the sides a character might face, i.e., back, left, front, and right.

We tested providing 3 images to the model, to generate the missing one. But we also evaluated the quality of the generated 
images when the model receives 2 or 1 input image.

The inputs to the model are the target (missing) domain and 4 image-like tensors with size 64x64x4 in the order
back, left, front, and right. The input images should be floating point tensors in the range of [-1, 1]. 
In place of the missing image(s), we must provide a tensor with shape 64x64x4 filled with zeros.


[paper-collagan]: https://www.computer.org/csdl/proceedings-article/cvpr/2019/329300c482/1gys5gg67QY
[paper-arxiv]: https://arxiv.org/abs/2409.10721
[paper-page]: https://fegemo.github.io/mdigan-characters
[paper-demo]: https://fegemo.github.io/interactive-generator

## Intended uses & limitations

This can be used for research purposes only. The quality of the generated images vary a lot, and a 
post-processing step to quantize the colors of the generated image to the intended palette is benefitial.


## Training and evaluation data

The model was trained with the [PAC dataset][pac], which features 12,074 paired images of pixel art characters
in 4 directions: back, left, front, and right. Compared to StarGAN and Pix2Pix-based baselines, the MDIGAN-Characters
model yielded much better images when it received 3 images, and still good images when only 2 are provided.

[pac]: https://github.com/plucksquire/pac/