## Build and Compose Conditioners

### Overview
Protein design via Chroma is highly customizable and programmable. Our robust Conditioner framework enables automatic conditional sampling tailored to a diverse array of protein specifications. This can involve either restraints (which bias the distribution of states using classifier guidance) or constraints (that directly limit the scope of the underlying sampling process). For a detailed explanation, refer to Supplementary Appendix M in our paper. We offer a variety of pre-defined conditioners, including those for managing substructure, symmetry, shape, semantics, and even natural-language prompts (see `chroma.layers.structure.conditioners`). These conditioners can be utilized in any combination to suit your specific needs.

### Composing Conditioners

Conditioners in Chroma can be combined seamlessly using `conditioners.ComposedConditioner`, akin to how layers are sequenced in `torch.nn.Sequential`. You can define individual conditioners and then aggregate them into a single collective list which will sequentially apply constrained transformations.

```python
composed_conditioner = conditioners.ComposedConditioner([conditioner1, conditioner2, conditioner3])
```

#### Setup

In [1]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"
%pip install generate-chroma > /dev/null 2>&1
from chroma import api
api.register_key(input("Enter API key: "))

Note: you may need to restart the kernel to use updated packages.




#### Example 1: Combining Symmetry and Secondary Structure
In this scenario, we initially apply guidance for secondary structure to condition the content accordingly. This is followed by incorporating Cyclic symmetry. This approach involves adding a secondary structure classifier to conditionally sample an Asymmetric unit (AU) that is beta-rich, followed by symmetrization.

In [10]:
from chroma.models import Chroma
from chroma.layers.structure import conditioners

chroma = Chroma()
# Conditional on C=2 (mostly beta)
beta = conditioners.ProClassConditioner('cath', "2", weight=5, max_norm=20)
c_symmetry = conditioners.SymmetryConditioner(G="C_3", num_chain_neighbors=2)
composed_cond = conditioners.ComposedConditioner([beta, c_symmetry])

symm_beta = chroma.sample(chain_lengths=[100],
    conditioner=composed_cond,
    langevin_factor=8,
    inverse_temperature=8,
    sde_func="langevin",
    steps=500)

symm_beta

Using cached data from /tmp/chroma_weights/90e339502ae6b372797414167ce5a632/weights.pt
Loaded from cache
Using cached data from /tmp/chroma_weights/03a3a9af343ae74998768a2711c8b7ce/weights.pt
Loaded from cache
Data saved to /tmp/chroma_weights/3262b44702040b1dcfccd71ebbcf451d/weights.pt
Computing reference stats for 2g3n


Integrating SDE:   0%|          | 0/500 [00:00<?, ?it/s]

Potts Sampling:   0%|          | 0/500 [00:00<?, ?it/s]

Sequential decoding:   0%|          | 0/300 [00:00<?, ?it/s]

NGLWidget()

#### Example 2: Merging Symmetry and Substructure
Here, our goal is to construct symmetric assemblies from a single-chain protein, partially redesigning it to merge three identical AUs into a Cyclic complex. We begin by defining the backbones targeted for redesign and then reposition the AU to prevent clashes during symmetrization. This is followed by the symmetrization operation itself.


In [14]:
from chroma.data import Protein

PDB_ID = '3BDI'
chroma = Chroma()

protein = Protein(PDB_ID, canonicalize=True, device='cuda')
# regenerate residues with X coord < 25 A and y coord < 25 A
substruct_conditioner = conditioners.SubstructureConditioner(
    protein, backbone_model=chroma.backbone_network, selection="x < 25 and y < 25")

# C_3 symmetry
c_symmetry = conditioners.SymmetryConditioner(G="C_3", num_chain_neighbors=3)

# Composing
composed_cond = conditioners.ComposedConditioner([substruct_conditioner, c_symmetry])

protein, trajectories = chroma.sample(
    protein_init=protein,
    conditioner=composed_cond,
    langevin_factor=4.0,
    langevin_isothermal=True,
    inverse_temperature=8.0,
    sde_func='langevin',
    steps=500,
    full_output=True,
)

protein

Using cached data from /tmp/chroma_weights/90e339502ae6b372797414167ce5a632/weights.pt
Loaded from cache
Using cached data from /tmp/chroma_weights/03a3a9af343ae74998768a2711c8b7ce/weights.pt
Loaded from cache


RuntimeError: torch.linalg.cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 27 is not positive-definite).

### Build your own conditioners: 2D protein lattices

An attractive aspect of this conditioner framework is that it is very general, enabling both constraints (which involve operations on $x$) and restraints (which amount to changes to $U$). At the same time, generation under restraints can still be (and often is) challenging, as the resulting effective energy landscape can become arbitrarily rugged and difficult to integrate. We therefore advise caution when using and developing new conditioners or conditioner combinations. We find that inspecting diffusition trajectories (including unconstrained and denoised trajectories, $\hat{x}_t$ and $\tilde{x}_t$) can be a good tool for identifying integration challenges and defining either better conditioner forms or better sampling regimes.

Here we present how to build a conditioner that generates a periodic 2D lattice. You can easily extend this code snippet to generate 3D protein materials.

In [13]:
import torch

class Lattice2DConditioner(conditioners.Conditioner):
    def __init__(self, M, N, cell):
        super().__init__()
        # Setup the coordinates of a 2D lattice
        self.order = M*N
        x = torch.arange(M) * cell[0]
        y = torch.arange(N) * cell[1]
        xx, yy = torch.meshgrid(x, y, indexing="ij")
        dX = torch.stack([xx.flatten(), yy.flatten(), torch.zeros(M * N)], dim=1)
        self.register_buffer("dX", dX)

    def forward(self, X, C, O, U, t):
        # Tesselate the unit cell on the lattice
        X = (X[:,None,...] + self.dX[None,:,None,None]).reshape(1, -1, 4, 3)
        C = torch.cat([C + C.unique().max() * i for i in range(self.dX.shape[0])], dim=1)
        # Average the gradient
        X.register_hook(lambda gradX: gradX / self.order)
        return X, C, O, U, t

chroma = Chroma()
M, N = 3, 3
conditioner = Lattice2DConditioner(M=M, N=N, cell=[25., 25.]).cuda()
protein, trajectories = chroma.sample(
    chain_lengths=[80], conditioner=conditioner, sde_func='langevin',
    potts_symmetry_order=conditioner.order,
    full_output=True
)

protein

Using cached data from /tmp/chroma_weights/90e339502ae6b372797414167ce5a632/weights.pt
Loaded from cache
Using cached data from /tmp/chroma_weights/03a3a9af343ae74998768a2711c8b7ce/weights.pt
Loaded from cache


Integrating SDE:   0%|          | 0/500 [00:00<?, ?it/s]

Potts Sampling:   0%|          | 0/500 [00:00<?, ?it/s]

Sequential decoding:   0%|          | 0/720 [00:00<?, ?it/s]

NGLWidget()

#### Notes on Troubleshooting
1. The sequence in which you apply conditioners is crucial. Generally, it's best to apply stringent and all-encompassing constraints towards the end. For instance, symmetry, a constraint that affects the entire complex, should be implemented last in the conditioner list.
When troubleshooting a conditioner, it's helpful to test it on a singular protein state. This helps in verifying if the resulting transformation aligns with your expectations.
2. If your conditioner, like the SymmetryConditioner, make copies of a single protein multiple times, it's important to divide the pull-back gradients by the number of protein copies. This prevents excessive gradient accumulation on the protein asymmetric unit, similar to what occurs in the Lattice2DConditioner. Refer to Appendix M for more details.
3. Adjusting sampling hyperparameters may be necessary when experimenting with new conditioners. Key parameters to consider include the langevin_factor, inverse_temperature, isothermal settings, steps, and guidance scale (especially when applying restraints). For dealing with hard constraints, it's usually advisable to use sde_func='langevin'.