File size: 6,513 Bytes
8896a5f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 |
Usage ===== Quick Start ~~~~~~~~~~~ Predict a new network using a trained model ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Pre-trained models can be downloaded from [TBD]. Candidate pairs should be in tab-separated (``.tsv``) format with no header, and columns for [protein name 1], [protein name 2]. Optionally, a third column with [label] can be provided, so predictions can be made using training or test data files (but the label will not affect the predictions). .. code-block:: bash dscript predict --pairs [input data] --seqs [sequences, .fasta format] --model [model file] Embed sequences with language model ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Sequences should be in ``.fasta`` format. .. code-block:: bash dscript embed --seqs [sequences] --outfile [embedding file] Train and save a model ^^^^^^^^^^^^^^^^^^^^^^ Training and validation data should be in tab-separated (``.tsv``) format with no header, and columns for [protein name 1], [protein name 2], [label]. .. code-block:: bash dscript train --train [training data] --val [validation data] --embedding [embedding file] --save-prefix [prefix] Evaluate a trained model ^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: bash dscript eval --model [model file] --test [test data] --embedding [embedding file] --outfile [result file] Prediction ~~~~~~~~~~ .. code-block:: bash usage: dscript predict [-h] --pairs PAIRS --model MODEL [--seqs SEQS] [--embeddings EMBEDDINGS] [-o OUTFILE] [-d DEVICE] [--thresh THRESH] Make new predictions with a pre-trained model. One of --seqs and --embeddings is required. optional arguments: -h, --help show this help message and exit --pairs PAIRS Candidate protein pairs to predict --model MODEL Pretrained Model --seqs SEQS Protein sequences in .fasta format --embeddings EMBEDDINGS h5 file with embedded sequences -o OUTFILE, --outfile OUTFILE File for predictions -d DEVICE, --device DEVICE Compute device to use --thresh THRESH Positive prediction threshold - used to store contact maps and predictions in a separate file. [default: 0.5] Embedding ~~~~~~~~~ .. code-block:: bash usage: dscript embed [-h] --seqs SEQS --outfile OUTFILE [-d DEVICE] Generate new embeddings using pre-trained language model optional arguments: -h, --help show this help message and exit --seqs SEQS Sequences to be embedded --outfile OUTFILE h5 file to write results -d DEVICE, --device DEVICE Compute device to use Training ~~~~~~~~ .. code-block:: bash usage: dscript train [-h] --train TRAIN --val VAL --embedding EMBEDDING [--augment] [--projection-dim PROJECTION_DIM] [--dropout-p DROPOUT_P] [--hidden-dim HIDDEN_DIM] [--kernel-width KERNEL_WIDTH] [--use-w] [--pool-width POOL_WIDTH] [--negative-ratio NEGATIVE_RATIO] [--epoch-scale EPOCH_SCALE] [--num-epochs NUM_EPOCHS] [--batch-size BATCH_SIZE] [--weight-decay WEIGHT_DECAY] [--lr LR] [--lambda LAMBDA_] [-o OUTFILE] [--save-prefix SAVE_PREFIX] [-d DEVICE] [--checkpoint CHECKPOINT] Train a new model optional arguments: -h, --help show this help message and exit Data: --train TRAIN Training data --val VAL Validation data --embedding EMBEDDING h5 file with embedded sequences --augment Set flag to augment data by adding (B A) for all pairs (A B) Projection Module: --projection-dim PROJECTION_DIM Dimension of embedding projection layer (default: 100) --dropout-p DROPOUT_P Parameter p for embedding dropout layer (default: 0.5) Contact Module: --hidden-dim HIDDEN_DIM Number of hidden units for comparison layer in contact prediction (default: 50) --kernel-width KERNEL_WIDTH Width of convolutional filter for contact prediction (default: 7) Interaction Module: --use-w Use weight matrix in interaction prediction model --pool-width POOL_WIDTH Size of max-pool in interaction model (default: 9) Training: --negative-ratio NEGATIVE_RATIO Number of negative training samples for each positive training sample (default: 10) --epoch-scale EPOCH_SCALE Report heldout performance every this many epochs (default: 5) --num-epochs NUM_EPOCHS Number of epochs (default: 100) --batch-size BATCH_SIZE Minibatch size (default: 25) --weight-decay WEIGHT_DECAY L2 regularization (default: 0) --lr LR Learning rate (default: 0.001) --lambda LAMBDA_ Weight on the similarity objective (default: 0.35) Output and Device: -o OUTPUT, --output OUTPUT Output file path (default: stdout) --save-prefix SAVE_PREFIX Path prefix for saving models -d DEVICE, --device DEVICE Compute device to use --checkpoint CHECKPOINT Checkpoint model to start training from`` Evaluation ~~~~~~~~~~ .. code-block:: bash usage: dscript eval [-h] --model MODEL --test TEST --embedding EMBEDDING [-o OUTFILE] [-d DEVICE] Evaluate a trained model optional arguments: -h, --help show this help message and exit --model MODEL Trained prediction model --test TEST Test Data --embedding EMBEDDING h5 file with embedded sequences -o OUTFILE, --outfile OUTFILE Output file to write results -d DEVICE, --device DEVICE Compute device to use |