--- language: - en tags: - image-generation - text-to-image - conditional-generation - generative-modeling - image-synthesis - image-manipulation - design-prototyping - research - educational license: mit metrics: - FID - KID - HWD - CER --- # VATr++ (Local Clone Version) This is a local-clone-friendly version of the **VATr++** styled handwritten text generation model. If you prefer not to rely on `transformers`’s `trust_remote_code=True`, you can simply clone this repository and load the model directly. > **Note**: For: > - Full training instructions > - Advanced features (style cycle loss, punctuation modes, etc.) > - Original code details please see the [VATr-pp GitHub repository](https://github.com/EDM-Research/VATr-pp). This local version is intended primarily for inference and basic usage. --- ## Installation & Setup 1. **Clone this repository (via Git LFS)**: ```bash git clone https://huggingface.co/blowing-up-groundhogs/vatrpp ``` 2. **Create (and activate) a conda environment (recommended)**: ```bash conda create --name vatr python=3.9 conda activate vatr ``` 3. **Install PyTorch (with CUDA if available)**: ```bash pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126 ``` 4. **Install additional requirements**: ```bash pip install transformers opencv-python matplotlib ``` --- ## Loading the Model Locally With the repository cloned, you can load either **VATr++** or the **original VATr** model locally. ### **VATr++** ```python from vatrpp import VATrPP model_vatr_pp = VATrPP.from_pretrained( "vatrpp", # Local folder name or path local_files_only=True ) ``` ### **VATr (original)** ```python from vatrpp import VATrPP model_vatr = VATrPP.from_pretrained( "vatrpp", local_files_only=True, subfolder="vatr" # Points to the original VATr checkpoint ) ``` --- ## Usage (Inference Example) Below is a **minimal** usage example demonstrating how to: 1. Load the **VATr++** model from your local clone. 2. Preprocess a style image (an image of handwriting). 3. Generate new handwritten text in the style of the provided image. ```python import numpy as np from PIL import Image import torch from torchvision import transforms as T # 1. Load the model (VATr++) from vatrpp import VATrPP model = VATrPP.from_pretrained("vatrpp", local_files_only=True) model.cuda() # 2. Helper functions to load and process style images def load_image(img, chunk_width=192): # Convert to grayscale and resize to height 32 img = img.convert("L") img = img.resize((img.width * 32 // img.height, 32)) arr = np.array(img) # Setup transforms: invert + normalize transform = T.Compose([ T.Grayscale(num_output_channels=1), T.ToTensor(), T.Normalize((0.5,), (0.5,)) ]) # Pad / chunk the image to a fixed width arr = 255 - arr height, width = arr.shape out = np.zeros((height, chunk_width), dtype="float32") out[:, :width] = arr[:, :chunk_width] out = 255 - out # Apply transforms out = transform(Image.fromarray(out.astype(np.uint8))) return out, width def load_image_line(img, chunk_width=192, style_imgs_count=15): # Convert to grayscale and resize img = img.convert("L") img = img.resize((img.width * 32 // img.height, 32)) arr = np.array(img) # Split into fixed-width chunks chunks = [] for start in range(0, arr.shape[1], chunk_width): chunk = arr[:, start:start+chunk_width] chunks.append(chunk) # Transform each chunk transformed = [] for c in chunks: t, _ = load_image(Image.fromarray(c), chunk_width) transformed.append(t) # If fewer than `style_imgs_count` chunks, repeat them while len(transformed) < style_imgs_count: transformed += transformed transformed = transformed[:style_imgs_count] # Combine return torch.cat(transformed, 0) # 3. Load a style image of your handwriting (or any handwriting sample) style_image_path = "path/to/your_style_image.png" img = Image.open(style_image_path) style_imgs = load_image_line(img) # 4. Generate text in the style of `style_image_path` generated_pil_image = model.generate( gen_text="This is a test", # Text to generate style_imgs=style_imgs, # Preprocessed style chunks align_words=True, # Align words at baseline at_once=True, # Generate line at once ) # 5. Save the generated image generated_pil_image.save("generated_output.png") ``` - **`style_imgs`**: A batch of fixed-width image chunks from your style reference. In practice, you can supply multiple small style samples or a single line image split into chunks. - **`gen_text`**: The text to render in the given style. - **`align_words`** and **`at_once`**: Optional arguments controlling how the text is laid out and generated. --- ## Original Repository This model is built upon the code from [**EDM-Research/VATr-pp**](https://github.com/EDM-Research/VATr-pp), itself an improvement on the [VATr](https://github.com/aimagelab/VATr) project. Please visit those repositories if you need to: - Train your own model from scratch - Explore advanced features (like style cycle loss, punctuation modes, or advanced augmentation) - Examine experimental details or replicate the original paper's setup --- ## License and Acknowledgments - The original code and model are under the license found in [the GitHub repository](https://github.com/EDM-Research/VATr-pp). - All credit goes to the original authors and maintainers for creating VATr++ and releasing it openly. - This local version is intended to simplify offline usage and keep everything self-contained.