---
language:
  - en
tags:
    - image-generation
    - text-to-image
    - conditional-generation
    - generative-modeling
    - image-synthesis
    - image-manipulation
    - design-prototyping
    - research
    - educational
license: mit
metrics:
  - FID
  - KID
  - HWD
  - CER
---

# VATr++ (Local Clone Version)

This is a local-clone-friendly version of the **VATr++** styled handwritten text generation model. If you prefer not to rely on `transformers`’s `trust_remote_code=True`, you can simply clone this repository and load the model directly.

> **Note**: For:
> - Full training instructions  
> - Advanced features (style cycle loss, punctuation modes, etc.)  
> - Original code details  

please see the [VATr-pp GitHub repository](https://github.com/EDM-Research/VATr-pp). This local version is intended primarily for inference and basic usage.

---

## Installation & Setup

1. **Clone this repository (via Git LFS)**:
   ```bash
   git clone https://huggingface.co/blowing-up-groundhogs/vatrpp
   ```

2. **Create (and activate) a conda environment (recommended)**:
   ```bash
   conda create --name vatr python=3.9
   conda activate vatr
   ```

3. **Install PyTorch (with CUDA if available)**:
   ```bash
   pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
   ```

4. **Install additional requirements**:
   ```bash
   pip install transformers opencv-python matplotlib
   ```

---

## Loading the Model Locally

With the repository cloned, you can load either **VATr++** or the **original VATr** model locally.

### **VATr++**

```python
from vatrpp import VATrPP

model_vatr_pp = VATrPP.from_pretrained(
    "vatrpp",            # Local folder name or path
    local_files_only=True
)
```

### **VATr (original)**

```python
from vatrpp import VATrPP

model_vatr = VATrPP.from_pretrained(
    "vatrpp",
    local_files_only=True,
    subfolder="vatr"     # Points to the original VATr checkpoint
)
```

---

## Usage (Inference Example)

Below is a **minimal** usage example demonstrating how to:

1. Load the **VATr++** model from your local clone.  
2. Preprocess a style image (an image of handwriting).  
3. Generate new handwritten text in the style of the provided image.

```python
import numpy as np
from PIL import Image
import torch
from torchvision import transforms as T

# 1. Load the model (VATr++)
from vatrpp import VATrPP
model = VATrPP.from_pretrained("vatrpp", local_files_only=True)
model.cuda()

# 2. Helper functions to load and process style images
def load_image(img, chunk_width=192):
    # Convert to grayscale and resize to height 32
    img = img.convert("L")
    img = img.resize((img.width * 32 // img.height, 32))
    arr = np.array(img)

    # Setup transforms: invert + normalize
    transform = T.Compose([
        T.Grayscale(num_output_channels=1),
        T.ToTensor(),
        T.Normalize((0.5,), (0.5,))
    ])

    # Pad / chunk the image to a fixed width
    arr = 255 - arr
    height, width = arr.shape
    out = np.zeros((height, chunk_width), dtype="float32")
    out[:, :width] = arr[:, :chunk_width]
    out = 255 - out

    # Apply transforms
    out = transform(Image.fromarray(out.astype(np.uint8)))
    return out, width

def load_image_line(img, chunk_width=192, style_imgs_count=15):
    # Convert to grayscale and resize
    img = img.convert("L")
    img = img.resize((img.width * 32 // img.height, 32))
    arr = np.array(img)

    # Split into fixed-width chunks
    chunks = []
    for start in range(0, arr.shape[1], chunk_width):
        chunk = arr[:, start:start+chunk_width]
        chunks.append(chunk)

    # Transform each chunk
    transformed = []
    for c in chunks:
        t, _ = load_image(Image.fromarray(c), chunk_width)
        transformed.append(t)

    # If fewer than `style_imgs_count` chunks, repeat them
    while len(transformed) < style_imgs_count:
        transformed += transformed
    transformed = transformed[:style_imgs_count]

    # Combine
    return torch.cat(transformed, 0)

# 3. Load a style image of your handwriting (or any handwriting sample)
style_image_path = "path/to/your_style_image.png"
img = Image.open(style_image_path)
style_imgs = load_image_line(img)

# 4. Generate text in the style of `style_image_path`
generated_pil_image = model.generate(
    gen_text="This is a test",    # Text to generate
    style_imgs=style_imgs,        # Preprocessed style chunks
    align_words=True,             # Align words at baseline
    at_once=True,                 # Generate line at once
)

# 5. Save the generated image
generated_pil_image.save("generated_output.png")
```

- **`style_imgs`**: A batch of fixed-width image chunks from your style reference. In practice, you can supply multiple small style samples or a single line image split into chunks.
- **`gen_text`**: The text to render in the given style.
- **`align_words`** and **`at_once`**: Optional arguments controlling how the text is laid out and generated.

---

## Original Repository

This model is built upon the code from [**EDM-Research/VATr-pp**](https://github.com/EDM-Research/VATr-pp), itself an improvement on the [VATr](https://github.com/aimagelab/VATr) project. Please visit those repositories if you need to:

- Train your own model from scratch  
- Explore advanced features (like style cycle loss, punctuation modes, or advanced augmentation)  
- Examine experimental details or replicate the original paper's setup  

---

## License and Acknowledgments

- The original code and model are under the license found in [the GitHub repository](https://github.com/EDM-Research/VATr-pp).  
- All credit goes to the original authors and maintainers for creating VATr++ and releasing it openly.
- This local version is intended to simplify offline usage and keep everything self-contained.