# Start using Chroma

<div style="border:2px solid #f7e097; padding:10px; margin-top:20px; background-color:#fefcd5; border-radius: 5px;">
    ðŸ”‘ <b>Note:</b> To generate proteins with Chroma, you'll need an API key fromÂ <a href="https://chroma-weights.generatebiomedicines.com">chroma-weights.generatebiomedicines.com</a>. Execute the cell below and enter your key after accepting the license.
</div>




In [1]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"
%pip install generate-chroma > /dev/null 2>&1

Note: you may need to restart the kernel to use updated packages.


In [2]:
import torch
from chroma import Chroma, Protein, conditioners, api
device = 'cuda' if torch.cuda.is_available() else 'cpu'
api.register_key(input("2cdade6d058b4fd1b85fa5badb501312"))



To generate protein samples with Chroma, initialize the model and call the sample method. The sample method generates a protein backbone, designs a sequence, and returns a `Protein` object.

In [3]:
# Initialize the Model
chroma = Chroma()

# Sample a Protein
protein = chroma.sample()

Using cached data from /tmp/chroma_weights/90e339502ae6b372797414167ce5a632/weights.pt
Loaded from cache
Using cached data from /tmp/chroma_weights/03a3a9af343ae74998768a2711c8b7ce/weights.pt
Loaded from cache


Integrating SDE:   0%|          | 0/500 [00:00<?, ?it/s]

RuntimeError: CUDA out of memory. Tried to allocate 30.00 MiB (GPU 0; 23.69 GiB total capacity; 127.36 MiB already allocated; 13.19 MiB free; 140.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

The `Protein` object enables one line inspection, saving, and loading of proteins.

In [None]:
print(protein) # Inspect the sequence of the protein sample
protein.to('chroma_sample.cif') # Save the sample to disk
protein = Protein('chroma_sample.cif') # Load a protein from disk

Protein: system
> Chain A (100 residues)
MKSIEEKLKEIIDKAKELGCDDCANRLKQVLDEIKRNKENKCEAYKKAIDALKSIVDELERRAQELASRDPELGKQAREQVENIKKEIDELIKEIKKSCA




In [None]:
display(protein)

NGLWidget()

In [None]:
# Calculate sample scores
elbo = chroma.score(protein)['elbo'].score
print(f'sample elbo: {elbo}')

Integrating diffusion metrics:   0%|          | 0/50 [00:00<?, ?it/s]

TypeError: unsupported operand type(s) for |: 'dict' and 'dict'

---

# Conditioning

Chroma conditioners allow us to program proteins. In the following examples we will show conditional generation for `Infilling`, `Symmetry`, `Shape`, `Protein Classes`, and `Natural Language`.

## Symmetry

Chroma can generate symmetric proteins with the help of the symmetry conditioner. We demonstrate a minimal example of conditioning on the cyclic point group with a 7-fold rotation axis. This point group has 7 asymmetric units arranged in a circle. The subunits are of size 50 in this example. The following parameters can be adjusted below:

* `SYMMETRY_GROUP`: symmetry group, choose from {'C_2', 'C_3', ..., "D_2", "D_3", ..., "T", "O", "I"}
* `SUBUNIT_SIZES`: chain lengths for asymmetric unit: e.g [100], [100, 150], more than one chain is allowed for the asymmetric unit
* `KNBR`: number of neighbors to pay attention to during sampling. max allowed is total number of asymetric units in the protein complex - 1.


In [None]:
SYMMETRY_GROUP = "C_7"
SUBUNIT_SIZES = [100]
KNBR = 2

In [None]:
# Draw a Sample
torch.manual_seed(0)
conditioner = conditioners.SymmetryConditioner(G=SYMMETRY_GROUP, num_chain_neighbors=KNBR)
symmetric_protein = chroma.sample(
    chain_lengths=SUBUNIT_SIZES,
    conditioner=conditioner,
    langevin_factor=8,
    inverse_temperature=8,
    sde_func="langevin",
    potts_symmetry_order=conditioner.potts_symmetry_order)

In [None]:
display(symmetric_protein)

## Infilling

Many protein design tasks including imputation of missing structural data, redesign of an enzyme scaffold given an active site, or redesign of the CDRs of a known antibody framework require exact specification of the known structural coordinates. The substructure conditioner enables this type of design. By specifiying the set of residues that are designable, and a protein to redesign, the user can perform infilling. In this example, a plane split is used which cuts a protein into two portions, a designable portion and a fixed portion. The following parameters can be set by the user:

* `MASK_FRACTION`: the fraction of the protein to redesign.
* `PDB_ID`: The pdb to use for a infilling. There are also a set of `TESTED_PDBS` that you can use as examples.

In [None]:
TESTED_PDBS = ['3bdi', '5sv5','6qaz','2e0q','5xb0','6bde','1a8q','5o0t','1drf','1shg']
MASK_PERCENT = 0.5 # Allow about 50% of the Protein to be designed
PDB_ID = TESTED_PDBS[0]

In [None]:
# Configure Substructure Conditioner
from chroma.utility.chroma import plane_split_protein
protein = Protein(PDB_ID, canonicalize=True, device=device)

X, C, _ = protein.to_XCS()
residues_to_design = plane_split_protein(X, C, protein, 0.5).nonzero()[:,1].tolist()
protein.sys.save_selection(gti=residues_to_design, selname="infilling_selection")

conditioner = conditioners.SubstructureConditioner(
        protein,
        backbone_model=chroma.backbone_network,
        selection = 'namesel infilling_selection').to(device)

In [None]:
# Draw a Sample
torch.manual_seed(0)
infilled_protein = chroma.sample(
             protein_init=protein,
             conditioner=conditioner,
             langevin_factor=4.0,
             langevin_isothermal=True,
             inverse_temperature=8.0,
             sde_func='langevin',
             steps=500)

In [None]:
display(infilled_protein)

## Shape

The shape conditioner enforces adherance to a predefined volumetric shape as represented by a point cloud. In the below example we use the Python Imaging Library to render a 3D point cloud from letters, and then we use the ShapeConditioner to sample backbones consistent with this point cloud. The user can set hyperparameters and vary the letter and the number of residues. For faster feedback, the number of steps has been decreased from that used in the manuscript. In this example both the choice of `LETTER` and the number of protein residues that fill the point cloud.
 * `LETTER`: a single character string containing the letter that will be made by the conditioner.
 * `NUM_RESIDUES`: the number of protein residues to fill the point cloud.


In [None]:
LETTER = "G"
NUM_RESIDUES = 1000

In [None]:
# Configure Shape Conditioner
from chroma.utility.chroma import letter_to_point_cloud
letter_point_cloud = letter_to_point_cloud(LETTER)

conditioner = conditioners.ShapeConditioner(
        letter_point_cloud,
        chroma.backbone_network.noise_schedule,
        autoscale_num_residues=NUM_RESIDUES).to(device)

In [None]:
# Draw a Sample
torch.manual_seed(0)
shaped_protein = chroma.sample(chain_lengths=[NUM_RESIDUES], conditioner=conditioner)

In [None]:
display(shaped_protein)

## CATH class

Proteins can be conditionally generated with specified folds according to CATH class annotations.  This conditioner uses the ProClass Model. Below we show a minimal example conditioning on generating a protein with mostly beta content.

The ProClass Conditioner can set CATH class annotations at 3 levels.

* `CATH_ANNOTATION`: `X`, e.g. `2` Selects a C level annotation, in this case "Mostly Beta"
* `CATH_ANNOTATION`: `X.X`, e.g. `2.60` Selects a CA level annotation, in this case "Sandwich"
* `CATH_ANNOTATION`: `X.X.X` e.g. `2.60.40` Selects a CAT level annotation, in this case "Immunoglobulin-like"

In general C level annotations are most robust.  CA and CAT level annotations typically require many more samples to get good results. See the paper experiments for details.

In [None]:
CATH_ANNOTATION = '2.60.40'

In [None]:
# Draw a Sample
torch.manual_seed(0)
conditioner = conditioners.ProClassConditioner('cath', CATH_ANNOTATION)
cath_conditioned_protein = chroma.sample(conditioner=conditioner)

In [None]:
display(cath_conditioned_protein)

## Natural language

Here, we demonstrate backbone generation conditioned on natural language prompts. The sampling is guided by the gradients of a structure to text model. To condition, we define a `ProCapConditioner` with the following inputs:
- a caption
- the chain ID, specifying the (1-indexed) caption refers to; captions corresponding to the entire protein can be indicated with `chain_id = -1`
- the weight with which to use the conditioner

Training was performed with individual chain captions drawn from UniProt, and complex-level captions taken from the PDB.

Below, we demonstrate caption-guided sampling to obtain a single chain backbone corresponding to an SH2 domain.

In [None]:
CAPTION = "Crystal structure of SH2 domain"

In [None]:
# Draw a Sample
torch.manual_seed(0)
conditioner = conditioners.ProCapConditioner(CAPTION, -1)
caption_conditioned_protein = chroma.sample(steps=200, chain_lengths=[110], conditioner=conditioner)

In [None]:
display(caption_conditioned_protein)