Add Brain2Vec-v2 files and model card

Browse files

Files changed (9) hide show

.DS_Store +0 -0
README.md +119 -3
autoencoder_final.pth +3 -0
create_csv.py +39 -0
discriminator_final.pth +3 -0
inference_brain2vec.py +240 -0
inputs_example.csv +6 -0
requirements.txt +21 -0
train_brain2vec.py +526 -0

.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

README.md CHANGED Viewed

@@ -1,3 +1,119 @@
----
-license: apache-2.0
----

+---
+base_model: radiata-ai/brain2vec
+license: apache-2.0
+language:
+  - en
+task_categories:
+  - image-classification
+tags:
+  - medical
+  - brain-data
+  - mri
+pretty_name: 3D Brain Structure MRI Autoencoder
+---
+## 🧠 Model Summary
+# brain2vec
+Version 2 of an autoencoder model for brain structure T1 MRIs (forked from [Brain Latent Progression](https://github.com/LemuelPuglisi/BrLP/tree/main)). The autoencoder takes in a 3d MRI NIfTI file and compresses to 1200 latent dimensions before reconstructing the image. The loss functions for training the autoencoder are:
+- [L1Loss](https://pytorch.org/docs/stable/generated/torch.nn.L1Loss.html)
+- [KLDivergenceLoss](https://pytorch.org/docs/stable/generated/torch.nn.KLDivLoss.html)
+- [PatchAdversarialLoss](https://docs.monai.io/en/stable/losses.html#patchadversarialloss)
+- [PerceptualLoss](https://docs.monai.io/en/stable/losses.html#perceptualloss)
+# Training data
+[Radiata brain-structure](https://huggingface.co/datasets/radiata-ai/brain-structure): 3066 scans from 2085 individuals in the 'train' split. Mean age = 45.1 +- 24.5, including 2847 scans from cognitively normal subjects and 219 scans from individuals with an Alzheimer's disease clinical diagnosis.
+# Example usage
+```
+# get brain2vec model repository
+git clone https://huggingface.co/radiata-ai/brain2vec
+cd brain2vec
+# pull pre-trained model weights
+sudo apt-get update
+sudo apt install git-lfs
+git lfs install
+git lfs pull
+# set up virtual environemt
+python3 -m venv venv_brain2vec
+source venv_brain2vec/bin/activate
+# install Python libraries
+pip install -r requirements.txt
+# create the csv file inputs.csv listing the scan paths and other info
+# this script loads the radiata-ai/brain-structure dataset from Hugging Face
+python create_csv.py
+mkdir ae_cache
+mkdir ae_output
+# train the model
+nohup python train_brain2vec.py \
+  --dataset_csv inputs.csv \
+  --cache_dir   ./ae_cache \
+  --output_dir  ./ae_output \
+  --n_epochs    10 \
+> train_log.txt 2>&1 &
+# model inference
+# for a set of scans in inputs.csv
+python inference_brain2vec.py \
+  --checkpoint_path /path/to/model.pth \
+  --csv_input inputs.csv \
+  --output_dir ./ae_output \
+  --embeddings_filename ae_embeddings_all.npy
+# or for individual scans
+python inference_brain2vec.py \
+  --checkpoint_path /path/to/model.pth \
+  --input_images /path/to/img1.nii.gz /path/to/img2.nii.gz \
+  --output_dir ./ae_output \
+  --embeddings_filename ae_embeddings_2.npy
+```
+# Methods
+Input scan image dimensions are 113x137x113, 1.5mm^3 resolution, aligned to MNI152 space (see [radiata-ai/brain-structure](https://huggingface.co/datasets/radiata-ai/brain-structure)).
+The image transform crops to 80 x 96 x 80, 2mm^3 resolution, and scales image intensity to range [0,1].
+The model was trained with an effective batch size=16, 10 epochs, learning rate=1e-4 (see references 1 and 2).
+# References
+1. Puglisi L, Alexander DC, Ravì D. Enhancing Spatiotemporal Disease Progression Models via Latent Diffusion and Prior Knowledge [Internet]. arXiv; 2024. Available from: http://arxiv.org/abs/2405.03328
+2. Pinaya WHL, Tudosiu PD, Dafflon J, Costa PF da, Fernandez V, Nachev P, et al. Brain Imaging Generation with Latent Diffusion Models [Internet]. arXiv; 2022. Available from: http://arxiv.org/abs/2209.07162
+# Citation
+```
+@misc{Radiata-Brain2vec,
+  author    = {Jesse Brown and Clayton Young},
+  title     = {Brain2vec: An Autoencoder Model for Brain Structure T1 MRIs},
+  year      = {2025},
+  url       = {https://huggingface.co/radiata-ai/brain2vec},
+  note      = {Version 1.0},
+  publisher = {Hugging Face}
+}
+```
+# License
+### Apache License 2.0
+Copyright 2025 Jesse Brown
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at:
+[http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0)
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.

autoencoder_final.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f66d42dd3bd58c39a110497ea463c35e52dfed097274338a37cb2efbfc4bf11c
+size 339644650

create_csv.py ADDED Viewed

	@@ -0,0 +1,39 @@

+#!/usr/bin/env python3
+import os
+import pandas as pd
+from datasets import load_dataset
+def row_to_dict(row, split_name):
+    return {
+        "image_uid": row["id"],
+        "age": int(row["metadata"]["age"]),
+        "sex": 1 if row["metadata"]["sex"].lower() == "male" else 2,
+        "image_path": os.path.abspath(row["nii_filepath"]),
+        "split": split_name
+    }
+def main():
+    # Load the datasets
+    ds_train = load_dataset("radiata-ai/brain-structure", split="train", trust_remote_code=True)
+    ds_val = load_dataset("radiata-ai/brain-structure", split="validation", trust_remote_code=True)
+    ds_test = load_dataset("radiata-ai/brain-structure", split="test", trust_remote_code=True)
+    rows = []
+    # Process each split
+    for data_row in ds_train:
+        rows.append(row_to_dict(data_row, "train"))
+    for data_row in ds_val:
+        rows.append(row_to_dict(data_row, "validation"))
+    for data_row in ds_test:
+        rows.append(row_to_dict(data_row, "test"))
+    # Create a DataFrame and write it to CSV
+    df = pd.DataFrame(rows)
+    output_csv = "inputs.csv"
+    df.to_csv(output_csv, index=False)
+    print(f"CSV file created: {output_csv}")
+if __name__ == "__main__":
+    main()

discriminator_final.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4ac993e13040de22843bb077dbb77cf0904e0aafced8429ba3ec3adfb47b3d02
+size 11099084

inference_brain2vec.py ADDED Viewed

	@@ -0,0 +1,240 @@

+#!/usr/bin/env python3
+"""
+inference_brain2vec.py
+Loads a pretrained Brain2vec VAE (AutoencoderKL) model and performs inference
+on one or more MRI images, generating reconstructions and latent parameters
+(z_mu, z_sigma).
+Example usage:
+    # 1) Multiple file paths
+    python inference_brain2vec.py \
+        --checkpoint_path /path/to/autoencoder_checkpoint.pth \
+        --input_images /path/to/img1.nii.gz /path/to/img2.nii.gz \
+        --output_dir ./vae_inference_outputs \
+        --device cuda
+    # 2) Use a CSV containing image paths
+    python inference_brain2vec.py \
+        --checkpoint_path /path/to/autoencoder_checkpoint.pth \
+        --csv_input /path/to/images.csv \
+        --output_dir ./vae_inference_outputs
+"""
+import os
+import argparse
+import numpy as np
+import torch
+import torch.nn as nn
+from typing import Optional
+from monai.transforms import (
+    Compose,
+    CopyItemsD,
+    LoadImageD,
+    EnsureChannelFirstD,
+    SpacingD,
+    ResizeWithPadOrCropD,
+    ScaleIntensityD,
+)
+from generative.networks.nets import AutoencoderKL
+import pandas as pd
+RESOLUTION = 2
+INPUT_SHAPE_AE = (80, 96, 80)
+transforms_fn = Compose([
+    CopyItemsD(keys={'image_path'}, names=['image']),
+    LoadImageD(image_only=True, keys=['image']),
+    EnsureChannelFirstD(keys=['image']),
+    SpacingD(pixdim=RESOLUTION, keys=['image']),
+    ResizeWithPadOrCropD(spatial_size=INPUT_SHAPE_AE, mode='minimum', keys=['image']),
+    ScaleIntensityD(minv=0, maxv=1, keys=['image']),
+])
+def preprocess_mri(image_path: str, device: str = "cpu") -> torch.Tensor:
+    """
+    Preprocess an MRI using MONAI transforms to produce
+    a 5D tensor (batch=1, channel=1, D, H, W) for inference.
+    Args:
+        image_path (str): Path to the MRI (e.g. .nii.gz).
+        device (str): Device to place the tensor on.
+    Returns:
+        torch.Tensor: Shape (1, 1, D, H, W).
+    """
+    data_dict = {"image_path": image_path}
+    output_dict = transforms_fn(data_dict)
+    image_tensor = output_dict["image"]  # shape: (1, D, H, W)
+    image_tensor = image_tensor.unsqueeze(0)  # => (1, 1, D, H, W)
+    return image_tensor.to(device)
+class Brain2vec(AutoencoderKL):
+    """
+    Subclass of MONAI's AutoencoderKL that includes:
+      - a from_pretrained(...) for loading a .pth checkpoint
+      - uses the existing forward(...) that returns (reconstruction, z_mu, z_sigma)
+    Usage:
+      >>> model = Brain2vec.from_pretrained("my_checkpoint.pth", device="cuda")
+      >>> image_tensor = preprocess_mri("/path/to/mri.nii.gz", device="cuda")
+      >>> reconstruction, z_mu, z_sigma = model.forward(image_tensor)
+    """
+    @staticmethod
+    def from_pretrained(
+        checkpoint_path: Optional[str] = None,
+        device: str = "cpu"
+    ) -> nn.Module:
+        """
+        Load a pretrained Brain2vec (AutoencoderKL) if a checkpoint_path is provided.
+        Otherwise, return an uninitialized model.
+        Args:
+            checkpoint_path (Optional[str]): Path to a .pth checkpoint file.
+            device (str): "cpu", "cuda", "mps", etc.
+        Returns:
+            nn.Module: The loaded Brain2vec model on the chosen device.
+        """
+        model = Brain2vec(
+            spatial_dims=3,
+            in_channels=1,
+            out_channels=1,
+            latent_channels=1,
+            num_channels=(64, 128, 256, 512),
+            num_res_blocks=2,
+            norm_num_groups=32,
+            norm_eps=1e-06,
+            attention_levels=(False, False, False, False),
+            with_decoder_nonlocal_attn=False,
+            with_encoder_nonlocal_attn=False,
+        )
+        if checkpoint_path is not None:
+            if not os.path.exists(checkpoint_path):
+                raise FileNotFoundError(f"Checkpoint {checkpoint_path} not found.")
+            state_dict = torch.load(checkpoint_path, map_location=device)
+            model.load_state_dict(state_dict)
+        model.to(device)
+        model.eval()
+        return model
+def main() -> None:
+    """
+    Main function to parse command-line arguments and run inference
+    with a pretrained Brain2vec model.
+    """
+    parser = argparse.ArgumentParser(
+        description="Inference script for a Brain2vec (VAE) model."
+    )
+    parser.add_argument(
+        "--checkpoint_path", type=str, required=True,
+        help="Path to the .pth checkpoint of the pretrained Brain2vec model."
+    )
+    parser.add_argument(
+        "--output_dir", type=str, default="./vae_inference_outputs",
+        help="Directory to save reconstructions and latent parameters."
+    )
+    # Two ways to supply images: multiple file paths or a CSV
+    parser.add_argument(
+        "--input_images", type=str, nargs="*",
+        help="One or more MRI file paths (e.g. .nii.gz)."
+    )
+    parser.add_argument(
+        "--csv_input", type=str,
+        help="Path to a CSV file with an 'image_path' column."
+    )
+    parser.add_argument(
+        "--embeddings_filename",
+        type=str,
+        required=True,
+        help="Filename (in output_dir) to save the stacked z_mu embeddings (e.g. 'all_z_mu.npy')."
+    )
+    parser.add_argument(
+        "--save_recons",
+        action="store_true",
+        help="If set, saves each reconstruction as .npy. Default is not to save."
+    )
+    args = parser.parse_args()
+    os.makedirs(args.output_dir, exist_ok=True)
+    # After parsing args, add:
+    device = "cuda" if torch.cuda.is_available() else "cpu"
+    # Then pass that device to the model:
+    model = Brain2vec.from_pretrained(
+        checkpoint_path=args.checkpoint_path,
+        device=device
+    )
+    # Gather image paths
+    if args.csv_input:
+        df = pd.read_csv(args.csv_input)
+        if "image_path" not in df.columns:
+            raise ValueError("CSV must contain a column named 'image_path'.")
+        image_paths = df["image_path"].tolist()
+    else:
+        if not args.input_images:
+            raise ValueError("Must provide either --csv_input or --input_images.")
+        image_paths = args.input_images
+    # Lists for stacking latent parameters later
+    all_z_mu = []
+    all_z_sigma = []
+    # Inference on each image
+    for i, img_path in enumerate(image_paths):
+        if not os.path.exists(img_path):
+            raise FileNotFoundError(f"Image not found: {img_path}")
+        print(f"[INFO] Processing image {i}: {img_path}")
+        img_tensor = preprocess_mri(img_path, device=device)
+        with torch.no_grad():
+            recon, z_mu, z_sigma = model.forward(img_tensor)
+        # Convert to NumPy
+        recon_np = recon.detach().cpu().numpy()  # shape: (1, 1, D, H, W)
+        z_mu_np = z_mu.detach().cpu().numpy()    # shape: (1, latent_channels, ...)
+        z_sigma_np = z_sigma.detach().cpu().numpy()
+        # Save each reconstruction (per image) as .npy
+    if args.save_recons:
+        recon_path = os.path.join(args.output_dir, f"reconstruction_{i}.npy")
+        np.save(recon_path, recon_np)
+        print(f"[INFO] Saved reconstruction to {recon_path}")
+        # Store latent parameters for optional combined saving
+        all_z_mu.append(z_mu_np)
+        all_z_sigma.append(z_sigma_np)
+    # Combine latent parameters from all images and save
+    stacked_mu = np.concatenate(all_z_mu, axis=0)       # e.g., shape (N, latent_channels, ...)
+    stacked_sigma = np.concatenate(all_z_sigma, axis=0) # e.g., shape (N, latent_channels, ...)
+    mu_filename = args.embeddings_filename
+    if not mu_filename.lower().endswith(".npy"):
+        mu_filename += ".npy"
+    mu_path = os.path.join(args.output_dir, mu_filename)
+    sigma_path = os.path.join(args.output_dir, "all_z_sigma.npy")
+    np.save(mu_path, stacked_mu)
+    np.save(sigma_path, stacked_sigma)
+    print(f"[INFO] Saved z_mu of shape {stacked_mu.shape} to {mu_path}")
+    print(f"[INFO] Saved z_sigma of shape {stacked_sigma.shape} to {sigma_path}")
+if __name__ == "__main__":
+    main()

inputs_example.csv ADDED Viewed

	@@ -0,0 +1,6 @@

+image_uid,age,sex,image_path,split
+0,81,2,/Users/jbrown2/.cache/huggingface/datasets/downloads/extracted/6429865a89f9ae54df1c3c2db5d0f1f25cf7dd43cb87704d76ed08cf8c194aba/OASIS-2/sub-OASIS20133/ses-03/anat/msub-OASIS20133_ses-03_T1w_brain_affine_mni.nii.gz,train
+1,78,2,/Users/jbrown2/.cache/huggingface/datasets/downloads/extracted/6429865a89f9ae54df1c3c2db5d0f1f25cf7dd43cb87704d76ed08cf8c194aba/OASIS-2/sub-OASIS20133/ses-01/anat/msub-OASIS20133_ses-01_T1w_brain_affine_mni.nii.gz,train
+2,87,1,/Users/jbrown2/.cache/huggingface/datasets/downloads/extracted/6429865a89f9ae54df1c3c2db5d0f1f25cf7dd43cb87704d76ed08cf8c194aba/OASIS-2/sub-OASIS20105/ses-02/anat/msub-OASIS20105_ses-02_T1w_brain_affine_mni.nii.gz,train
+3,86,1,/Users/jbrown2/.cache/huggingface/datasets/downloads/extracted/6429865a89f9ae54df1c3c2db5d0f1f25cf7dd43cb87704d76ed08cf8c194aba/OASIS-2/sub-OASIS20105/ses-01/anat/msub-OASIS20105_ses-01_T1w_brain_affine_mni.nii.gz,train
+4,84,1,/Users/jbrown2/.cache/huggingface/datasets/downloads/extracted/6429865a89f9ae54df1c3c2db5d0f1f25cf7dd43cb87704d76ed08cf8c194aba/OASIS-2/sub-OASIS20102/ses-02/anat/msub-OASIS20102_ses-02_T1w_brain_affine_mni.nii.gz,train

requirements.txt ADDED Viewed

	@@ -0,0 +1,21 @@

+# PyTorch (CUDA or CPU version).
+torch>=1.12
+# Install MONAI Generative first
+monai-generative
+# Install the latest MONAI directly from GitHub (development version)
+git+https://github.com/Project-MONAI/MONAI.git#egg=monai
+# For perceptual losses in MONAI's generative module.
+lpips
+# Common Python libraries
+pandas
+numpy
+nibabel
+tqdm
+tensorboard
+matplotlib
+datasets

train_brain2vec.py ADDED Viewed

	@@ -0,0 +1,526 @@

+#!/usr/bin/env python3
+"""
+train_brain2vec.py
+Trains a 3D VAE-based Brain2Vec model using MONAI. This script implements
+autoencoder training with adversarial loss (via a patch discriminator),
+a perceptual loss, and KL divergence regularization for robust latent
+representations.
+Example usage:
+    python train_brain2vec.py \
+        --dataset_csv inputs.csv \
+        --cache_dir ./ae_cache \
+        --output_dir ./ae_output \
+        --n_epochs 10
+"""
+import os
+os.environ["PYTORCH_WEIGHTS_ONLY"] = "False"
+from typing import Optional, Union
+import pandas as pd
+import argparse
+import numpy as np
+import warnings
+import torch
+import torch.nn as nn
+from torch import Tensor
+from torch.optim.optimizer import Optimizer
+from torch.nn import L1Loss
+from torch.utils.data import DataLoader
+from torch.amp import autocast
+from torch.amp import GradScaler
+from generative.networks.nets import (
+    AutoencoderKL,
+    PatchDiscriminator,
+)
+from generative.losses import PerceptualLoss, PatchAdversarialLoss
+from monai.data import Dataset, PersistentDataset
+from monai.transforms.transform import Transform
+from monai import transforms
+from monai.utils import set_determinism
+from monai.data.meta_tensor import MetaTensor
+import torch.serialization
+from numpy.core.multiarray import _reconstruct
+from numpy import ndarray, dtype
+torch.serialization.add_safe_globals([_reconstruct])
+torch.serialization.add_safe_globals([MetaTensor])
+torch.serialization.add_safe_globals([ndarray])
+torch.serialization.add_safe_globals([dtype])
+from tqdm import tqdm
+import matplotlib.pyplot as plt
+from torch.utils.tensorboard import SummaryWriter
+# voxel resolution
+RESOLUTION = 2
+# shape of the MNI152 (1mm^3) template
+INPUT_SHAPE_1mm = (182, 218, 182)
+# resampling the MNI152 to (1.5mm^3)
+INPUT_SHAPE_1p5mm = (122, 146, 122)
+# Adjusting the dimensions to be divisible by 8 (2^3 where 3 are the downsampling layers of the AE)
+#INPUT_SHAPE_AE = (120, 144, 120)
+INPUT_SHAPE_AE = (80, 96, 80)
+# Latent shape of the autoencoder
+LATENT_SHAPE_AE = (1, 10, 12, 10)
+def load_if(checkpoints_path: Optional[str], network: nn.Module) -> nn.Module:
+    """
+    Load pretrained weights if available.
+    Args:
+        checkpoints_path (Optional[str]): path of the checkpoints
+        network (nn.Module): the neural network to initialize
+    Returns:
+        nn.Module: the initialized neural network
+    """
+    if checkpoints_path is not None:
+        assert os.path.exists(checkpoints_path), 'Invalid path'
+        network.load_state_dict(torch.load(checkpoints_path))
+    return network
+def init_autoencoder(checkpoints_path: Optional[str] = None) -> nn.Module:
+    """
+    Load the KL autoencoder (pretrained if `checkpoints_path` points to previous params).
+    Args:
+        checkpoints_path (Optional[str], optional): path of the checkpoints. Defaults to None.
+    Returns:
+        nn.Module: the KL autoencoder
+    """
+    autoencoder = AutoencoderKL(spatial_dims=3,
+                                in_channels=1,
+                                out_channels=1,
+                                latent_channels=1, #3,
+                                num_channels=(64, 128, 256, 512),
+                                num_res_blocks=2,
+                                norm_num_groups=32,
+                                norm_eps=1e-06,
+                                attention_levels=(False, False, False, False),
+                                with_decoder_nonlocal_attn=False,
+                                with_encoder_nonlocal_attn=False)
+    return load_if(checkpoints_path, autoencoder)
+def init_patch_discriminator(checkpoints_path: Optional[str] = None) -> nn.Module:
+    """
+    Load the patch discriminator (pretrained if `checkpoints_path` points to previous params).
+    Args:
+        checkpoints_path (Optional[str], optional): path of the checkpoints. Defaults to None.
+    Returns:
+        nn.Module: the patch discriminator
+    """
+    patch_discriminator = PatchDiscriminator(spatial_dims=3,
+                                             num_layers_d=3,
+                                             num_channels=32,
+                                             in_channels=1,
+                                             out_channels=1)
+    return load_if(checkpoints_path, patch_discriminator)
+class KLDivergenceLoss:
+    """
+    A class for computing the Kullback-Leibler divergence loss.
+    """
+    def __call__(self, z_mu: Tensor, z_sigma: Tensor) -> Tensor:
+        """
+        Computes the KL divergence loss for the given parameters.
+        Args:
+            z_mu (Tensor):  The mean of the distribution.
+            z_sigma (Tensor): The standard deviation of the distribution.
+        Returns:
+            Tensor: The computed KL divergence loss, averaged over the batch size.
+        """
+        kl_loss = 0.5 * torch.sum(z_mu.pow(2) + z_sigma.pow(2) - torch.log(z_sigma.pow(2)) - 1, dim=[1, 2, 3, 4])
+        return torch.sum(kl_loss) / kl_loss.shape[0]
+class GradientAccumulation:
+    """
+    Implements gradient accumulation to facilitate training with larger
+    effective batch sizes than what can be physically accommodated in memory.
+    """
+    def __init__(self,
+                 actual_batch_size: int,
+                 expect_batch_size: int,
+                 loader_len: int,
+                 optimizer: Optimizer,
+                 grad_scaler: Optional[GradScaler] = None) -> None:
+        """
+        Initializes the GradientAccumulation instance with the necessary parameters for
+        managing gradient accumulation.
+        Args:
+            actual_batch_size (int): The size of the mini-batches actually used in training.
+            expect_batch_size (int): The desired (effective) batch size to simulate through gradient accumulation.
+            loader_len (int): The length of the data loader, representing the total number of mini-batches.
+            optimizer (Optimizer): The optimizer used for performing optimization steps.
+            grad_scaler (Optional[GradScaler], optional): A GradScaler for mixed precision training. Defaults to None.
+        Raises:
+            AssertionError: If `expect_batch_size` is not divisible by `actual_batch_size`.
+        """
+        assert expect_batch_size % actual_batch_size == 0, \
+            'expect_batch_size must be divisible by actual_batch_size'
+        self.actual_batch_size = actual_batch_size
+        self.expect_batch_size = expect_batch_size
+        self.loader_len = loader_len
+        self.optimizer = optimizer
+        self.grad_scaler = grad_scaler
+        # if the expected batch size is N=KM, and the actual batch size
+        # is M, then we need to accumulate gradient from N / M = K optimization steps.
+        self.steps_until_update = expect_batch_size / actual_batch_size
+    def step(self, loss: Tensor, step: int) -> None:
+        """
+        Performs a backward pass for the given loss and potentially executes an optimization
+        step if the conditions for gradient accumulation are met. The optimization step is taken
+        only after a specified number of steps (defined by the expected batch size) or at the end
+        of the dataset.
+        Args:
+            loss (Tensor): The loss value for the current forward pass.
+            step (int): The current step (mini-batch index) within the epoch.
+        """
+        loss = loss / self.expect_batch_size
+        if self.grad_scaler is not None:
+            self.grad_scaler.scale(loss).backward()
+        else:
+            loss.backward()
+        if (step + 1) % self.steps_until_update == 0 or (step + 1) == self.loader_len:
+            if self.grad_scaler is not None:
+                self.grad_scaler.step(self.optimizer)
+                self.grad_scaler.update()
+            else:
+                self.optimizer.step()
+            self.optimizer.zero_grad(set_to_none=True)
+class AverageLoss:
+    """
+    Utility class to track losses
+    and metrics during training.
+    """
+    def __init__(self):
+        self.losses_accumulator = {}
+    def put(self, loss_key:str, loss_value:Union[int,float]) -> None:
+        """
+        Store value
+        Args:
+            loss_key (str): Metric name
+            loss_value (int | float): Metric value to store
+        """
+        if loss_key not in self.losses_accumulator:
+            self.losses_accumulator[loss_key] = []
+        self.losses_accumulator[loss_key].append(loss_value)
+    def pop_avg(self, loss_key:str) -> float:
+        """
+        Average the stored values of a given metric
+        Args:
+            loss_key (str): Metric name
+        Returns:
+            float: average of the stored values
+        """
+        if loss_key not in self.losses_accumulator:
+            return None
+        losses = self.losses_accumulator[loss_key]
+        self.losses_accumulator[loss_key] = []
+        return sum(losses) / len(losses)
+    def to_tensorboard(self, writer: SummaryWriter, step: int):
+        """
+        Logs the average value of all the metrics stored
+        into Tensorboard.
+        Args:
+            writer (SummaryWriter): Tensorboard writer
+            step (int): Tensorboard logging global step
+        """
+        for metric_key in self.losses_accumulator.keys():
+            writer.add_scalar(metric_key, self.pop_avg(metric_key), step)
+def get_dataset_from_pd(df: pd.DataFrame, transforms_fn: Transform, cache_dir: Optional[str]) -> Union[Dataset,PersistentDataset]:
+    """
+    If `cache_dir` is defined, returns a `monai.data.PersistenDataset`.
+    Otherwise, returns a simple `monai.data.Dataset`.
+    Args:
+        df (pd.DataFrame): Dataframe describing each image in the longitudinal dataset.
+        transforms_fn (Transform): Set of transformations
+        cache_dir (Optional[str]): Cache directory (ensure enough storage is available)
+    Returns:
+        Dataset|PersistentDataset: The dataset
+    """
+    assert cache_dir is None or os.path.exists(cache_dir), 'Invalid cache directory path'
+    data = df.to_dict(orient='records')
+    return Dataset(data=data, transform=transforms_fn) if cache_dir is None \
+        else PersistentDataset(data=data, transform=transforms_fn, cache_dir=cache_dir)
+def tb_display_reconstruction(writer, step, image, recon):
+    """
+    Display reconstruction in TensorBoard during AE training.
+    """
+    plt.style.use('dark_background')
+    _, ax = plt.subplots(ncols=3, nrows=2, figsize=(7, 5))
+    for _ax in ax.flatten(): _ax.set_axis_off()
+    if len(image.shape) == 4: image = image.squeeze(0)
+    if len(recon.shape) == 4: recon = recon.squeeze(0)
+    ax[0, 0].set_title('original image', color='cyan')
+    ax[0, 0].imshow(image[image.shape[0] // 2, :, :], cmap='gray')
+    ax[0, 1].imshow(image[:, image.shape[1] // 2, :], cmap='gray')
+    ax[0, 2].imshow(image[:, :, image.shape[2] // 2], cmap='gray')
+    ax[1, 0].set_title('reconstructed image', color='magenta')
+    ax[1, 0].imshow(recon[recon.shape[0] // 2, :, :], cmap='gray')
+    ax[1, 1].imshow(recon[:, recon.shape[1] // 2, :], cmap='gray')
+    ax[1, 2].imshow(recon[:, :, recon.shape[2] // 2], cmap='gray')
+    plt.tight_layout()
+    writer.add_figure('Reconstruction', plt.gcf(), global_step=step)
+def set_environment(seed: int = 0) -> None:
+    """
+    Set deterministic behavior for reproducibility.
+    Args:
+        seed (int, optional): Seed value. Defaults to 0.
+    """
+    set_determinism(seed)
+def train(
+    dataset_csv: str,
+    cache_dir: str,
+    output_dir: str,
+    aekl_ckpt: Optional[str] = None,
+    disc_ckpt: Optional[str] = None,
+    num_workers: int = 8,
+    n_epochs: int = 5,
+    max_batch_size: int = 2,
+    batch_size: int = 16,
+    lr: float = 1e-4,
+    aug_p: float = 0.8,
+    device: str = ('cuda' if torch.cuda.is_available() else
+                   'cpu'),
+) -> None:
+    """
+    Train the autoencoder and discriminator models.
+    Args:
+        dataset_csv (str): Path to the dataset CSV file.
+        cache_dir (str): Directory for caching data.
+        output_dir (str): Directory to save model checkpoints.
+        aekl_ckpt (Optional[str], optional): Path to the autoencoder checkpoint. Defaults to None.
+        disc_ckpt (Optional[str], optional): Path to the discriminator checkpoint. Defaults to None.
+        num_workers (int, optional): Number of data loader workers. Defaults to 8.
+        n_epochs (int, optional): Number of training epochs. Defaults to 5.
+        max_batch_size (int, optional): Actual batch size per iteration. Defaults to 2.
+        batch_size (int, optional): Expected (effective) batch size. Defaults to 16.
+        lr (float, optional): Learning rate. Defaults to 1e-4.
+        aug_p (float, optional): Augmentation probability. Defaults to 0.8.
+        device (str, optional): Device to run the training on. Defaults to 'cuda' if available.
+    """
+    set_environment(0)
+    transforms_fn = transforms.Compose([
+        transforms.CopyItemsD(keys={'image_path'}, names=['image']),
+        transforms.LoadImageD(image_only=True, keys=['image']),
+        transforms.EnsureChannelFirstD(keys=['image']),
+        transforms.SpacingD(pixdim=2, keys=['image']),
+        transforms.ResizeWithPadOrCropD(spatial_size=(80, 96, 80), mode='minimum', keys=['image']),
+        transforms.ScaleIntensityD(minv=0, maxv=1, keys=['image'])
+    ])
+    dataset_df = pd.read_csv(dataset_csv)
+    train_df = dataset_df[dataset_df.split == 'train']
+    trainset = get_dataset_from_pd(train_df, transforms_fn, cache_dir)
+    train_loader = DataLoader(
+        dataset=trainset,
+        num_workers=num_workers,
+        batch_size=max_batch_size,
+        shuffle=True,
+        persistent_workers=True,
+        pin_memory=True,
+    )
+    print('Device is %s' %(device))
+    autoencoder = init_autoencoder(aekl_ckpt).to(device)
+    discriminator = init_patch_discriminator(disc_ckpt).to(device)
+    # Loss Weights
+    adv_weight = 0.025
+    perceptual_weight = 0.001
+    kl_weight = 1e-7
+    # Loss Functions
+    l1_loss_fn = L1Loss()
+    kl_loss_fn = KLDivergenceLoss()
+    adv_loss_fn = PatchAdversarialLoss(criterion="least_squares")
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        perc_loss_fn = PerceptualLoss(
+            spatial_dims=3,
+            network_type="squeeze",
+            is_fake_3d=True,
+            fake_3d_ratio=0.2
+        ).to(device)
+    # Optimizers
+    optimizer_g = torch.optim.Adam(autoencoder.parameters(), lr=lr)
+    optimizer_d = torch.optim.Adam(discriminator.parameters(), lr=lr)
+    # Gradient Accumulation
+    gradacc_g = GradientAccumulation(
+        actual_batch_size=max_batch_size,
+        expect_batch_size=batch_size,
+        loader_len=len(train_loader),
+        optimizer=optimizer_g,
+        grad_scaler=GradScaler()
+    )
+    gradacc_d = GradientAccumulation(
+        actual_batch_size=max_batch_size,
+        expect_batch_size=batch_size,
+        loader_len=len(train_loader),
+        optimizer=optimizer_d,
+        grad_scaler=GradScaler()
+    )
+    # Logging
+    avgloss = AverageLoss()
+    writer = SummaryWriter()
+    total_counter = 0
+    for epoch in range(n_epochs):
+        print(f"[DEBUG] Starting epoch {epoch}/{n_epochs-1}")
+        autoencoder.train()
+        progress_bar = tqdm(enumerate(train_loader), total=len(train_loader))
+        progress_bar.set_description(f'Epoch {epoch}')
+        for step, batch in progress_bar:
+            # Generator Training
+            with autocast(device, enabled=True):
+                images = batch["image"].to(device)
+                reconstruction, z_mu, z_sigma = autoencoder(images)
+                logits_fake = discriminator(reconstruction.contiguous().float())[-1]
+                rec_loss = l1_loss_fn(reconstruction.float(), images.float())
+                kl_loss = kl_weight * kl_loss_fn(z_mu, z_sigma)
+                per_loss = perceptual_weight * perc_loss_fn(reconstruction.float(), images.float())
+                gen_loss = adv_weight * adv_loss_fn(logits_fake, target_is_real=True, for_discriminator=False)
+                loss_g = rec_loss + kl_loss + per_loss + gen_loss
+            gradacc_g.step(loss_g, step)
+            # Discriminator Training
+            with autocast(device, enabled=True):
+                logits_fake = discriminator(reconstruction.contiguous().detach())[-1]
+                d_loss_fake = adv_loss_fn(logits_fake, target_is_real=False, for_discriminator=True)
+                logits_real = discriminator(images.contiguous().detach())[-1]
+                d_loss_real = adv_loss_fn(logits_real, target_is_real=True, for_discriminator=True)
+                discriminator_loss = (d_loss_fake + d_loss_real) * 0.5
+                loss_d = adv_weight * discriminator_loss
+            gradacc_d.step(loss_d, step)
+            # Logging
+            avgloss.put('Generator/reconstruction_loss', rec_loss.item())
+            avgloss.put('Generator/perceptual_loss', per_loss.item())
+            avgloss.put('Generator/adversarial_loss', gen_loss.item())
+            avgloss.put('Generator/kl_regularization', kl_loss.item())
+            avgloss.put('Discriminator/adversarial_loss', loss_d.item())
+            if total_counter % 10 == 0:
+                step_log = total_counter // 10
+                avgloss.to_tensorboard(writer, step_log)
+                tb_display_reconstruction(
+                    writer,
+                    step_log,
+                    images[0].detach().cpu(),
+                    reconstruction[0].detach().cpu()
+                )
+            total_counter += 1
+        # Save the model after each epoch.
+        os.makedirs(output_dir, exist_ok=True)
+        torch.save(discriminator.state_dict(), os.path.join(output_dir, f'discriminator-ep-{epoch}.pth'))
+        torch.save(autoencoder.state_dict(), os.path.join(output_dir, f'autoencoder-ep-{epoch}.pth'))
+    writer.close()
+    print("Training completed and models saved.")
+def main():
+    """
+    Main function to parse command-line arguments and run train().
+    """
+    import argparse
+    parser = argparse.ArgumentParser(description="brain2vec Training Script")
+    parser.add_argument('--dataset_csv', type=str, required=True, help='Path to the dataset CSV file.')
+    parser.add_argument('--cache_dir', type=str, required=True, help='Directory for caching data.')
+    parser.add_argument('--output_dir', type=str, required=True, help='Directory to save model checkpoints.')
+    parser.add_argument('--aekl_ckpt', type=str, default=None, help='Path to the autoencoder checkpoint.')
+    parser.add_argument('--disc_ckpt', type=str, default=None, help='Path to the discriminator checkpoint.')
+    parser.add_argument('--num_workers', type=int, default=8, help='Number of data loader workers.')
+    parser.add_argument('--n_epochs', type=int, default=5, help='Number of training epochs.')
+    parser.add_argument('--max_batch_size', type=int, default=2, help='Actual batch size per iteration.')
+    parser.add_argument('--batch_size', type=int, default=16, help='Expected (effective) batch size.')
+    parser.add_argument('--lr', type=float, default=1e-4, help='Learning rate.')
+    parser.add_argument('--aug_p', type=float, default=0.8, help='Augmentation probability.')
+    args = parser.parse_args()
+    train(
+        dataset_csv=args.dataset_csv,
+        cache_dir=args.cache_dir,
+        output_dir=args.output_dir,
+        aekl_ckpt=args.aekl_ckpt,
+        disc_ckpt=args.disc_ckpt,
+        num_workers=args.num_workers,
+        n_epochs=args.n_epochs,
+        max_batch_size=args.max_batch_size,
+        batch_size=args.batch_size,
+        lr=args.lr,
+        aug_p=args.aug_p,
+    )
+if __name__ == '__main__':
+    main()