## Setup

No setup is required. Simply fill in the input boxes with the necessary data and click the **Run** button.  
You can find a list of examples at the bottom of the page; clicking on them will autofill the fields for you.  
If the server remains idle for a period, it will enter standby mode. Running a calculation will wake the tool from standby, but note that the first run may take longer due to startup and model loading.

## Input

**Sequence**: Enter the full amino acid sequence to be analyzed in the **Sequence** text box.  
  Note: While jolly characters (e.g., `-X.B`) can be included, they currently cannot be visualised.

**Substitutions**: Specify the substitutions you wish to test in the **Substitutions** box. The tool supports three running modes based on your input:

- **Single Substitution**: Input one or more substitutions (e.g. `R218K R218W`) to score specific changes.
- **Residue Position**: Provide residue positions to evaluate all possible substitutions at those sites.
- **Same-Length Sequence**: Analyze differing amino acid substitutions one by one within sequences of equal length.
- **Different Inputs**: For any other input format, a deep mutational scan of the full sequence will be performed.

**Model Selection**: Choose an ESM model for calculations from those available on Hugging Face Model Hub.  
  The model `esm2_t33_650M_UR50D` offers an optimal balance between cost and accuracy [*](https://doi.org/10.1126/science.ade2574).

**Accuracy Option**: The **Use higher accuracy** option applies a masked-marginals scoring strategy, which considers sequence context during inference.  
  While this method is slower, it enhances accuracy. If you experience long runtimes, unchecking this option can significantly speed up calculations at the cost of some accuracy.

**Deep Mutational Scan Recommendations**: When performing a deep mutational scan, it is advisable to use smaller models (8M, 35M, or 150M parameters) due to significant runtime concerns—especially with longer sequences or during peak server usage times.  
  For example, calculating a 300-residue-long sequence with larger models may require over 30 minutes.  
  Generally, accuracy is more affected by the scoring strategy than by model size; therefore, prioritise reducing model size when optimizing for runtime.  
  The computational cost of the scoring strategy scales with the number of substitutions tested, while model cost scales with wild-type sequence length.

**Concurrent Substitutions**: To calculate the effect of multiple concurrent substitutions, you must manually change the input sequence and rerun the calculation. Accuracy is not guaranteed as this use case is yet untested.

## Output

Results are displayed in a color-coded table, except for deep mutational scans, which produce a heatmap.  
In the table:

- Beneficial substitutions are highlighted in green with positive values.  
- Detrimental substitutions appear in red with negative values.  

As a rule of thumb, score differences of *4* or more are considered significant. For instance:

- A substitution scoring *-6* is likely detrimental to protein functionality.
- A score of *+2* is generally regarded as neutral.

The **Download raw data** button lets you download the output in CSV format.  


**If you use this tool in your research, please cite**:  

Totaro MG, Vide U, Zausinger R, Winkler A, Oberdorfer G. ESM-scan—A tool to guide amino acid substitutions. *Protein Science.* 2024; 33(12):e5221. [doi.org/10.1002/pro.5221](https://doi.org/10.1002/pro.5221)