File size: 6,513 Bytes
8896a5f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 |
Usage
=====
Quick Start
~~~~~~~~~~~
Predict a new network using a trained model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Pre-trained models can be downloaded from [TBD].
Candidate pairs should be in tab-separated (``.tsv``) format with no header, and columns for [protein name 1], [protein name 2].
Optionally, a third column with [label] can be provided, so predictions can be made using training or test data files (but the label will not affect the predictions).
.. code-block:: bash
dscript predict --pairs [input data] --seqs [sequences, .fasta format] --model [model file]
Embed sequences with language model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Sequences should be in ``.fasta`` format.
.. code-block:: bash
dscript embed --seqs [sequences] --outfile [embedding file]
Train and save a model
^^^^^^^^^^^^^^^^^^^^^^
Training and validation data should be in tab-separated (``.tsv``) format with no header, and columns for [protein name 1], [protein name 2], [label].
.. code-block:: bash
dscript train --train [training data] --val [validation data] --embedding [embedding file] --save-prefix [prefix]
Evaluate a trained model
^^^^^^^^^^^^^^^^^^^^^^^^
.. code-block:: bash
dscript eval --model [model file] --test [test data] --embedding [embedding file] --outfile [result file]
Prediction
~~~~~~~~~~
.. code-block:: bash
usage: dscript predict [-h] --pairs PAIRS --model MODEL [--seqs SEQS]
[--embeddings EMBEDDINGS] [-o OUTFILE] [-d DEVICE]
[--thresh THRESH]
Make new predictions with a pre-trained model. One of --seqs and --embeddings is required.
optional arguments:
-h, --help show this help message and exit
--pairs PAIRS Candidate protein pairs to predict
--model MODEL Pretrained Model
--seqs SEQS Protein sequences in .fasta format
--embeddings EMBEDDINGS
h5 file with embedded sequences
-o OUTFILE, --outfile OUTFILE
File for predictions
-d DEVICE, --device DEVICE
Compute device to use
--thresh THRESH Positive prediction threshold - used to store contact
maps and predictions in a separate file. [default:
0.5]
Embedding
~~~~~~~~~
.. code-block:: bash
usage: dscript embed [-h] --seqs SEQS --outfile OUTFILE [-d DEVICE]
Generate new embeddings using pre-trained language model
optional arguments:
-h, --help show this help message and exit
--seqs SEQS Sequences to be embedded
--outfile OUTFILE h5 file to write results
-d DEVICE, --device DEVICE
Compute device to use
Training
~~~~~~~~
.. code-block:: bash
usage: dscript train [-h] --train TRAIN --val VAL --embedding EMBEDDING
[--augment] [--projection-dim PROJECTION_DIM]
[--dropout-p DROPOUT_P] [--hidden-dim HIDDEN_DIM]
[--kernel-width KERNEL_WIDTH] [--use-w]
[--pool-width POOL_WIDTH]
[--negative-ratio NEGATIVE_RATIO]
[--epoch-scale EPOCH_SCALE] [--num-epochs NUM_EPOCHS]
[--batch-size BATCH_SIZE] [--weight-decay WEIGHT_DECAY]
[--lr LR] [--lambda LAMBDA_] [-o OUTFILE]
[--save-prefix SAVE_PREFIX] [-d DEVICE]
[--checkpoint CHECKPOINT]
Train a new model
optional arguments:
-h, --help show this help message and exit
Data:
--train TRAIN Training data
--val VAL Validation data
--embedding EMBEDDING
h5 file with embedded sequences
--augment Set flag to augment data by adding (B A) for all pairs
(A B)
Projection Module:
--projection-dim PROJECTION_DIM
Dimension of embedding projection layer (default: 100)
--dropout-p DROPOUT_P
Parameter p for embedding dropout layer (default: 0.5)
Contact Module:
--hidden-dim HIDDEN_DIM
Number of hidden units for comparison layer in contact
prediction (default: 50)
--kernel-width KERNEL_WIDTH
Width of convolutional filter for contact prediction
(default: 7)
Interaction Module:
--use-w Use weight matrix in interaction prediction model
--pool-width POOL_WIDTH
Size of max-pool in interaction model (default: 9)
Training:
--negative-ratio NEGATIVE_RATIO
Number of negative training samples for each positive
training sample (default: 10)
--epoch-scale EPOCH_SCALE
Report heldout performance every this many epochs
(default: 5)
--num-epochs NUM_EPOCHS
Number of epochs (default: 100)
--batch-size BATCH_SIZE
Minibatch size (default: 25)
--weight-decay WEIGHT_DECAY
L2 regularization (default: 0)
--lr LR Learning rate (default: 0.001)
--lambda LAMBDA_ Weight on the similarity objective (default: 0.35)
Output and Device:
-o OUTPUT, --output OUTPUT
Output file path (default: stdout)
--save-prefix SAVE_PREFIX
Path prefix for saving models
-d DEVICE, --device DEVICE
Compute device to use
--checkpoint CHECKPOINT
Checkpoint model to start training from``
Evaluation
~~~~~~~~~~
.. code-block:: bash
usage: dscript eval [-h] --model MODEL --test TEST --embedding EMBEDDING
[-o OUTFILE] [-d DEVICE]
Evaluate a trained model
optional arguments:
-h, --help show this help message and exit
--model MODEL Trained prediction model
--test TEST Test Data
--embedding EMBEDDING
h5 file with embedded sequences
-o OUTFILE, --outfile OUTFILE
Output file to write results
-d DEVICE, --device DEVICE
Compute device to use
|