# Chroma quickstart

First, run the [setup cell](#setup) below. Then, run [this cell](#unconditional-chain) to get a Chroma sample. Further examples are below.

In [2]:
# @title Setup

# @markdown [Get your API key here](https://chroma-weights.generatebiomedicines.com) and enter it below before running.

import os

os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:8"
import contextlib

api_key = "2cdade6d058b4fd1b85fa5badb501312" # @param {type:"string"}


import torch

# torch.use_deterministic_algorithms(False)

import warnings
from tqdm import tqdm, TqdmExperimentalWarning

warnings.filterwarnings("ignore", category=TqdmExperimentalWarning)
from functools import partialmethod

tqdm.__init__ = partialmethod(tqdm.__init__, leave=False)

import ipywidgets as widgets


def create_button(filename, description=""):
 button = widgets.Button(description=description)
 display(button)

 def on_button_click(b):
 files.download(filename)

 button.on_click(on_button_click)


def render(protein, trajectories, output="protein.cif"):
 display(protein)
 print(protein)
 protein.to_CIF(output)
 traj_output = output.replace(".cif", "_trajectory.cif")
 trajectories["trajectory"].to_CIF(traj_output)
 create_button(output, description="Download sample")
 create_button(traj_output, description="Download trajectory")


import locale

locale.getpreferredencoding = lambda: "UTF-8"

from chroma import Chroma, Protein, conditioners
from chroma.models import graph_classifier, procap
from chroma.utility.api import register_key
from chroma.utility.chroma import letter_to_point_cloud, plane_split_protein

register_key(api_key)
device = "cuda"
with contextlib.redirect_stdout(None):
 chroma = Chroma(device=device)



In [3]:
# @title Get a protein! {display-mode: "form"}

# @markdown Specify the desired length. Chroma will output a fully designed single chain protein.
# @markdown As with all examples in this notebook, the trajectory can also be downloaded.

length = 160 # @param {type:"slider", min:50, max:250, step:10}

protein, trajectories = chroma.sample(
 chain_lengths=[length], steps=200, full_output=True
)
render(protein, trajectories, output=" .cif")

Integrating SDE: 0%| | 0/200 [00:00, ?it/s]

Potts Sampling: 0%| | 0/500 [00:00, ?it/s]

Sequential decoding: 0%| | 0/160 [00:00, ?it/s]

NGLWidget()

Protein: system
> Chain A (160 residues)
MRIEARTPEAARRAVDLAIKLKEKGYEVLLVLIGDPSNPELLEIARRLAEAGAKIRVIALVDDSPEAQAGVERLRQVCEELREKGADVELDVITAPLDDPEAQQRARELAEKYISEGEEEAKKKNKPFILILVRPSTDEEEAQREADEAEKKIEEYLKSL




Button(description='Download sample', style=ButtonStyle())

Button(description='Download trajectory', style=ButtonStyle())

## Conditional generation

After running the setup at the top of the notebook, all examples are completely independent.

[Single chain](#unconditional-chain): the simplest example of protein generation with Chroma.

[Complex](#unconditional-complex): a protein with multiple chains.

[Symmetry](#symmetry): a symmetric complex, where the symmetry group and subunit size can be input by the user.

[Substructure](#substructure): infilling a PDB structure, where the residues to design can be specified by a PyMOL-style string.

[Shape](#shape): Chroma generation conditioned on shape, using letters as an example.

[Topology](#proclass-chain): chain-level conditioning using ProClass, where a CAT code can be specified.

[Secondary structure](#proclass-residue): ProClass also provides conditioning of secondary structure, which can be input as a per-residue string.

[Natural language](#procap): ProCap takes a user caption in order to condition Chroma generation.



In [None]:
# @title Complexes {display-mode: "form"}

# @markdown Given the lengths of individual chains, Chroma can generate a complex.

chain1_length = 400 # @param {type:"slider", min:100, max:500, step:10}
chain2_length = 100 # @param {type:"slider", min:0, max:200, step:10}
chain3_length = 100 # @param {type:"slider", min:0, max:200, step:10}
chain4_length = 100 # @param {type:"slider", min:0, max:200, step:10}

protein, trajectories = chroma.sample(
 chain_lengths=[chain1_length, chain2_length, chain3_length, chain4_length],
 steps=200,
 full_output=True,
)
render(protein, trajectories, output="complex.cif")

In [None]:
# @title Symmetry {display-mode: "form"}

# @markdown Specify the desired symmetry type and the size of a single subunit.

symmetry_group = "C_7" # @param ["C_2", "C_3", "C_4", "C_5", "C_6", "C_7", "C_8", "D_2", "D_3", "D_4", "D_5", "D_6", "D_7", "D_8", "T", "O", "I"]
subunit_size = 100 # @param {type:"slider", min:10, max:150, step:5}
knbr = 2

conditioner = conditioners.SymmetryConditioner(
 G=symmetry_group, num_chain_neighbors=knbr
)
symmetric_protein, trajectories = chroma.sample(
 chain_lengths=[subunit_size],
 conditioner=conditioner,
 langevin_factor=8,
 inverse_temperature=8,
 sde_func="langevin",
 potts_symmetry_order=conditioner.potts_symmetry_order,
 full_output=True,
)
render(symmetric_protein, trajectories, output="symmetric_protein.cif")

In [9]:
# @title Substructure {display-mode: "form"}

# @markdown Enter a PDB ID and a selection string corresponding to designable positions.
# @markdown Using a substructure conditioner, Chroma can design at these positions while holding the rest of the structure fixed.
# @markdown The default selection cuts the protein in half and fills it in.
# @markdown Other selections, by position or proximity, are also allowed.

pdb_id = "5SV5" # @param ['5SV5', '6QAZ', '3BDI'] {allow-input:true}

try:
 protein = Protein.from_PDBID(pdb_id, canonicalize=True, device=device)
 display(protein)
except FileNotFoundError:
 print("Invalid PDB ID! Using 3BDI")
 pdb_id = "3BDI"
 protein = Protein.from_PDBID(pdb_id, canonicalize=True, device=device)

X, C, _ = protein.to_XCS()
selection_string = "namesel infilling_selection" # @param ['namesel infilling_selection', 'z > 16', '(resid 50) around 10'] {allow-input:true}
residues_to_design = plane_split_protein(X, C, protein, 0.5).nonzero()[:, 1].tolist()
protein.sys.save_selection(gti=residues_to_design, selname="infilling_selection")

try:
 conditioner = conditioners.SubstructureConditioner(
 protein, backbone_model=chroma.backbone_network, selection=selection_string
 ).to(device)
except Exception:
 print("Error initializing conditioner! Falling back to masking 50% of residues.")
 selection_string = "namesel infilling_selection"
 conditioner = conditioners.SubstructureConditioner(
 protein,
 backbone_model=chroma.backbone_network,
 selection=selection_string,
 rg=True,
 ).to(device)

infilled_protein, trajectories = chroma.sample(
 protein_init=protein,
 conditioner=conditioner,
 langevin_factor=4.0,
 langevin_isothermal=True,
 inverse_temperature=8.0,
 steps=500,
 sde_func="langevin",
 full_output=True,
)
render(infilled_protein, trajectories, output="infilled_protein.cif")

NGLWidget()

Split protein by plane, masking 52.38 percent of residues.
Error initializing conditioner! Falling back to masking 50% of residues.


RuntimeError: expected scalar type Double but found Float

In [None]:
print(device)

In [None]:
# @title Shape {display-mode: "form"}

# @markdown Create a protein in the shape of a desired character of arbitrary length.

character = "G" # @param {type:"string"}
if len(character) > 1:
 character = character[:1]
 print(f"Keeping only first character ({character})!")
length = 1000 # @param {type:"slider", min:100, max:1500, step:100}

letter_point_cloud = letter_to_point_cloud(character)
conditioner = conditioners.ShapeConditioner(
 letter_point_cloud,
 chroma.backbone_network.noise_schedule,
 autoscale_num_residues=length,
).to(device)

shaped_protein, trajectories = chroma.sample(
 chain_lengths=[length], conditioner=conditioner, full_output=True
)

render(shaped_protein, trajectories, output="shaped_protein.cif")

In [4]:
# @title Fold {display-mode: "form"}

# @markdown Input a [CATH number](https://cathdb.info/browse) to get chain-level conditioning, e.g. `3.40.50` for a Rossmann fold or `2` for mainly beta.

CATH = "3.40.50" # @param {type:"string"}
length = 130 # @param {type:"slider", min:50, max:250, step:10}

proclass_model = graph_classifier.load_model("named:public", device=device)
conditioner = conditioners.ProClassConditioner("cath", CATH, model=proclass_model)
cath_conditioned_protein, trajectories = chroma.sample(
 conditioner=conditioner, chain_lengths=[length], full_output=True
)
render(cath_conditioned_protein, trajectories, output="cath_conditioned_protein.cif")

Using cached data from /tmp/chroma_weights/3262b44702040b1dcfccd71ebbcf451d/weights.pt
Loaded from cache


Integrating SDE: 0%| | 0/500 [00:00, ?it/s]

Potts Sampling: 0%| | 0/500 [00:00, ?it/s]

Sequential decoding: 0%| | 0/130 [00:00, ?it/s]

NGLWidget()

Protein: system
> Chain A (130 residues)
MIPPFIPKKLLDELKKLAEKYGATIEFMPFEEAAQKHLSPEALARPIRDLLKELEDKINEAINEFYSLLPKDIEVKPVTLSIVFPEMPEEELKRFIDEIKTLINKVIDEYKSLPKEERQKEALELIKELF




Button(description='Download sample', style=ButtonStyle())

Button(description='Download trajectory', style=ButtonStyle())

In [None]:
# @title Secondary structure {display-mode: "form"}

# @markdown Enter a string to specify residue-level secondary structure conditioning: H = helix, E = strand, T = turn.

SS = "HHHHHHHTTTHHHHHHHTTTEEEEEETTTEEEEEEEETTTTHHHHHHHH" # @param {type:"string"}

proclass_model = graph_classifier.load_model("named:public", device=device)
conditioner = conditioners.ProClassConditioner(
 "secondary_structure", SS, max_norm=None, model=proclass_model
)
ss_conditioned_protein, trajectories = chroma.sample(
 steps=500, conditioner=conditioner, chain_lengths=[len(SS)], full_output=True
)
render(ss_conditioned_protein, trajectories, output="ss_conditioned_protein.cif")

In [5]:
# @title Natural text {display-mode: "form"}

# @markdown ProCap uses natural language captions to condition samples.

length = 110 # @param {type:"slider", min:50, max:250, step:10}
caption = "Crystal structure of SH2 domain" # @param {type:"string"}

procap_model = procap.load_model("named:public", device=device)
conditioner = conditioners.ProCapConditioner(caption, -1, model=procap_model)
caption_conditioned_protein, trajectories = chroma.sample(
 steps=200, chain_lengths=[length], conditioner=conditioner, full_output=True
)
render(
 caption_conditioned_protein, trajectories, output="caption_conditioned_protein.cif"
)

Using cached data from /tmp/chroma_weights/87243729397de5f93afc4f392662d1b5/weights.pt


OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like EleutherAI/gpt-neo-125m is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.