Usage
=====

Quick Start
~~~~~~~~~~~

Predict a new network using a trained model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Pre-trained models can be downloaded from [TBD].
Candidate pairs should be in tab-separated (``.tsv``) format with no header, and columns for [protein name 1], [protein name 2].
Optionally, a third column with [label] can be provided, so predictions can be made using training or test data files (but the label will not affect the predictions).

.. code-block:: bash

    dscript predict --pairs [input data] --seqs [sequences, .fasta format] --model [model file]

Embed sequences with language model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Sequences should be in ``.fasta`` format.

.. code-block:: bash

    dscript embed --seqs [sequences] --outfile [embedding file]

Train and save a model
^^^^^^^^^^^^^^^^^^^^^^

Training and validation data should be in tab-separated (``.tsv``) format with no header, and columns for [protein name 1], [protein name 2], [label].

.. code-block:: bash

    dscript train --train [training data] --val [validation data] --embedding [embedding file] --save-prefix [prefix]


Evaluate a trained model
^^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: bash

    dscript eval --model [model file] --test [test data] --embedding [embedding file] --outfile [result file]


Prediction
~~~~~~~~~~

.. code-block:: bash

    usage: dscript predict [-h] --pairs PAIRS --model MODEL [--seqs SEQS]
                        [--embeddings EMBEDDINGS] [-o OUTFILE] [-d DEVICE]
                        [--thresh THRESH]

    Make new predictions with a pre-trained model. One of --seqs and --embeddings is required.

    optional arguments:
    -h, --help            show this help message and exit
    --pairs PAIRS         Candidate protein pairs to predict
    --model MODEL         Pretrained Model
    --seqs SEQS           Protein sequences in .fasta format
    --embeddings EMBEDDINGS
                            h5 file with embedded sequences
    -o OUTFILE, --outfile OUTFILE
                            File for predictions
    -d DEVICE, --device DEVICE
                            Compute device to use
    --thresh THRESH       Positive prediction threshold - used to store contact
                            maps and predictions in a separate file. [default:
                            0.5]

Embedding
~~~~~~~~~

.. code-block:: bash

    usage: dscript embed [-h] --seqs SEQS --outfile OUTFILE [-d DEVICE]

    Generate new embeddings using pre-trained language model

    optional arguments:
    -h, --help            show this help message and exit
    --seqs SEQS           Sequences to be embedded
    --outfile OUTFILE     h5 file to write results
    -d DEVICE, --device DEVICE
                            Compute device to use

Training
~~~~~~~~

.. code-block:: bash

    usage: dscript train [-h] --train TRAIN --val VAL --embedding EMBEDDING
                        [--augment] [--projection-dim PROJECTION_DIM]
                        [--dropout-p DROPOUT_P] [--hidden-dim HIDDEN_DIM]
                        [--kernel-width KERNEL_WIDTH] [--use-w]
                        [--pool-width POOL_WIDTH]
                        [--negative-ratio NEGATIVE_RATIO]
                        [--epoch-scale EPOCH_SCALE] [--num-epochs NUM_EPOCHS]
                        [--batch-size BATCH_SIZE] [--weight-decay WEIGHT_DECAY]
                        [--lr LR] [--lambda LAMBDA_] [-o OUTFILE]
                        [--save-prefix SAVE_PREFIX] [-d DEVICE]
                        [--checkpoint CHECKPOINT]

    Train a new model

    optional arguments:
    -h, --help            show this help message and exit

    Data:
    --train TRAIN         Training data
    --val VAL             Validation data
    --embedding EMBEDDING
                            h5 file with embedded sequences
    --augment             Set flag to augment data by adding (B A) for all pairs
                            (A B)

    Projection Module:
    --projection-dim PROJECTION_DIM
                            Dimension of embedding projection layer (default: 100)
    --dropout-p DROPOUT_P
                            Parameter p for embedding dropout layer (default: 0.5)

    Contact Module:
    --hidden-dim HIDDEN_DIM
                            Number of hidden units for comparison layer in contact
                            prediction (default: 50)
    --kernel-width KERNEL_WIDTH
                            Width of convolutional filter for contact prediction
                            (default: 7)

    Interaction Module:
    --use-w               Use weight matrix in interaction prediction model
    --pool-width POOL_WIDTH
                            Size of max-pool in interaction model (default: 9)

    Training:
    --negative-ratio NEGATIVE_RATIO
                            Number of negative training samples for each positive
                            training sample (default: 10)
    --epoch-scale EPOCH_SCALE
                            Report heldout performance every this many epochs
                            (default: 5)
    --num-epochs NUM_EPOCHS
                            Number of epochs (default: 100)
    --batch-size BATCH_SIZE
                            Minibatch size (default: 25)
    --weight-decay WEIGHT_DECAY
                            L2 regularization (default: 0)
    --lr LR               Learning rate (default: 0.001)
    --lambda LAMBDA_      Weight on the similarity objective (default: 0.35)

    Output and Device:
    -o OUTPUT, --output OUTPUT
                            Output file path (default: stdout)
    --save-prefix SAVE_PREFIX
                            Path prefix for saving models
    -d DEVICE, --device DEVICE
                            Compute device to use
    --checkpoint CHECKPOINT
                            Checkpoint model to start training from``

Evaluation
~~~~~~~~~~

.. code-block:: bash

    usage: dscript eval [-h] --model MODEL --test TEST --embedding EMBEDDING
                        [-o OUTFILE] [-d DEVICE]

    Evaluate a trained model

    optional arguments:
    -h, --help            show this help message and exit
    --model MODEL         Trained prediction model
    --test TEST           Test Data
    --embedding EMBEDDING
                            h5 file with embedded sequences
    -o OUTFILE, --outfile OUTFILE
                            Output file to write results
    -d DEVICE, --device DEVICE
                            Compute device to use