jfaustin's picture
secretion-scores (#4)
a3f3d91 verified
FreeSASA
========
These pages document the
- @ref CLI
- @ref API "FreeSASA C API"
- @ref Python "FreeSASA Python interface"
- @ref Config-file
- @ref Selection
- @ref Geometry
The library is released under the [MIT license](license.md).
Installation instructions can be found in the [README](README.md) file.
@page CLI Command-line Interface
Building FreeSASA creates the binary `freesasa`, which is installed by
`make install`. Calling
$ freesasa -h
displays a help message listing all options. The following text
explains how to use most of them.
@section CLI-Default Run using defaults
In the following we will use the RNA/protein complex PDB structure
3WBM as an example. It has four protein chains A, B, C and D, and two
RNA strands X and Y. To run a simple SASA calculation using default
parameters, simply type:
$ freesasa 3wbm.pdb
This generates the following output
## FreeSASA 2.0 ##
PARAMETERS
algorithm : Lee & Richards
probe-radius : 1.400
threads : 2
slices : 20
INPUT
source : 3wbm.pdb
chains : ABCDXY
atoms : 3714
RESULTS (A^2)
Total : 25190.77
Apolar : 11552.38
Polar : 13638.39
CHAIN A : 3785.49
CHAIN B : 4342.33
CHAIN C : 3961.12
CHAIN D : 4904.30
CHAIN X : 4156.46
CHAIN Y : 4041.08
The results are all in the unit Ångström-squared.
@section parameters Changing parameters
If higher precision is needed, the command
$ freesasa -n 100 3wbm.pdb
specifies that the calculation should use 100 slices per atom instead of
the default 20. The command
$ freesasa --shrake-rupley -n 200 --probe-radius 1.2 --n-threads 4 3wbm.pdb
instead calculates the SASA using Shrake & Rupley's algorithm with 200
test points, a probe radius of 1.2 Å, using 4 parallel threads to
speed things up.
If the user wants to use their own atomic radii the command
$ freesasa --config-file <file> 3wbm.pdb
Reads a configuration from a file and uses it to assign atomic
radii. The program will halt if it encounters atoms in the PDB input
that are not present in the configuration. See @ref Config-file for
instructions how to write a configuration.
To use the atomic radii from NACCESS call
$ freesasa --radii=naccess 3wbm.pdb
Another way to specify a custom set of atomic radii is to store them as
occupancies in the input PDB file
$ freesasa --radius-from-occupancy 3wbm.pdb
This option allows the user to first use the option `--format=pdb` (see @ref CLI-PDB) to
write generate a PDB file with the radii used in the calculation,
modify the radii of individual atoms in that file, and then recalculate
the SASA with these modified radii.
@section Output Output formats
In addition to the standard output format above FreeSASA can export
the results as @ref CLI-JSON, @ref CLI-XML, @ref CLI-PDB, @ref
CLI-RSA, @ref CLI-RES and @ref CLI-SEQ using the option
`--format`. The level of detail of JSON and XML output can be
controlled with the option `--output-depth=<depth>` which takes the
values `atom`, `residue`, `chain` and `structure`. If `atom` is
chosen, SASA values are shown for all levels of the structure,
including individual atoms. With `chain`, only structure and chain
SASA values are printed (this is the default).
The output can include relative SASA values for each residues. To
calculate these a reference SASA value is needed, calculated using the
same atomic radii. At the moment such values are only available for
the ProtOr and NACCESS radii (selected using the option `--radii`), if
other radii are used relative SASA will be excluded (in RSA output all
REL columns will have the value 'N/A').
The reference SASA values for residue X are calculated from Ala-X-Ala
peptides in a stretched out configuration. The reference
configurations are supplied for reference in the directory
`rsa`. Since these are not always the most exposed possible
configuration, and because bond lengths and bond angles vary, the
relative SASA values will sometimes be larger than 100 %. At the
moment there is no interface to supply user-defined reference values.
@subsection CLI-JSON JSON
The command
$ freesasa --format=xml --output-depth=residue 3wbm.pdb
generates the following
~~~~{.json}
{
"source":"FreeSASA 2.0",
"length-unit":"Ångström",
"results":[
{
"input":"3wbm.pdb",
"classifier":"ProtOr",
"parameters":{
"algorithm":"Lee & Richards",
"probe-radius":1.3999999999999999,
"resolution":20
},
"structures":[
{
"chain-labels":"ABCDXY",
"area":{
"total":25190.768387067546,
"polar":13638.391677017404,
"apolar":11552.376710050148,
"main-chain":3337.1622502425053,
"side-chain":21853.606136825045
},
"chains":[
{
"label":"A",
"n-residues":86,
"area":{
"total":3785.4864049452635,
"polar":1733.8560208488598,
"apolar":2051.6303840964056,
"main-chain":723.34358684348558,
"side-chain":3062.1428181017791
}
"residues":[
{
"name":"THR",
"number":"5",
"area":{
"total":138.48216994006549,
"polar":56.887951514571867,
"apolar":81.594218425493622,
"main-chain":38.898190013033592,
"side-chain":99.583979927031905
},
"relative-area":{
"total":104.05152148175331,
"polar":113.98106895325961,
"apolar":98.093554250413092,
"main-chain":96.330336832673567,
"side-chain":107.414496739329
},
"n-atoms":7
},
...
},
...
]
}
]
}
]
}
~~~~
Where ellipsis indicates the remaining residues and chains.
@subsection CLI-XML XML
The command
$ freesasa --format=xml 3wbm.pdb
Generates the following
~~~~{.xml}
<?xml version="1.0" encoding="UTF-8"?>
<results xmlns="http://freesasa.github.io/" source="FreeSASA 2.0" lengthUnit="&#xC5;ngstr&#xF6;m">
<result classifier="ProtOr" input="3wbm.pdb">
<parameters algorithm="Lee &amp; Richards" probeRadius="1.400000" resolution="20"/>
<structure chains="ABCDXY">
<area total="25190.768" polar="13638.392" apolar="11552.377" mainChain="3337.162" sideChain="21853.606"/>
<chain label="A" nResidues="86">
<area total="3785.486" polar="1733.856" apolar="2051.630" mainChain="723.344" sideChain="3062.143"/>
</chain>
<chain label="B" nResidues="84">
<area total="4342.334" polar="1957.114" apolar="2385.220" mainChain="853.707" sideChain="3488.627"/>
</chain>
<chain label="C" nResidues="86">
<area total="3961.119" polar="1838.724" apolar="2122.395" mainChain="782.652" sideChain="3178.468"/>
</chain>
<chain label="D" nResidues="89">
<area total="4904.298" polar="2332.306" apolar="2571.991" mainChain="977.459" sideChain="3926.838"/>
</chain>
<chain label="X" nResidues="25">
<area total="4156.455" polar="2919.576" apolar="1236.879" mainChain="0.000" sideChain="4156.455"/>
</chain>
<chain label="Y" nResidues="25">
<area total="4041.076" polar="2856.815" apolar="1184.261" mainChain="0.000" sideChain="4041.076"/>
</chain>
</structure>
</result>
</results>
~~~~
@subsection CLI-PDB PDB
The command-line interface can also be used as a PDB filter:
$ cat 3wbm.pdb | freesasa --format=pdb
REMARK 999 This PDB file was generated by FreeSASA 2.0.
REMARK 999 In the ATOM records temperature factors have been
REMARK 999 replaced by the SASA of the atom, and the occupancy
REMARK 999 by the radius used in the calculation.
MODEL 1
ATOM 1 N THR A 5 -19.727 29.259 13.573 1.64 9.44
ATOM 2 CA THR A 5 -19.209 28.356 14.602 1.88 5.01
ATOM 3 C THR A 5 -18.747 26.968 14.116 1.61 0.40
...
The output is a PDB-file where the temperature factors have been
replaced by SASA values (last column), and occupancy numbers by the
radius of each atom (second to last column).
Only the atoms and models used in the calculation will be present in
the output (see @ref Input for how to modify this).
@subsection CLI-RES SASA of each residue type
Calculate the SASA of each residue type:
$ freesasa --format=res 3wbm.pdb
# Residue types in 3wbm.pdb
RES ALA : 251.57
RES ARG : 2868.98
RES ASN : 1218.87
...
RES A : 1581.57
RES C : 2967.12
RES G : 1955.16
RES U : 1693.68
@subsection CLI-SEQ SASA of each residue
Calculate the SASA of each residue in the sequence:
$ freesasa --format=seq 3wbm.pdb
# Residues in 3wbm.pdb
SEQ A 5 THR : 138.48
SEQ A 6 PRO : 25.53
SEQ A 7 THR : 99.42
...
@subsection CLI-RSA RSA
The CLI can also produce output similar to the RSA format from
NACCESS. This format includes both absolute SASA values (ABS) and
relative ones (REL) compared to a precalculated reference max
value. The only significant difference between FreeSASA's RSA output
format and that of NACCESS (except differences in areas due to
different atomic radii), is that FreeSASA will print the value "N/A"
where NACCESS prints "-99.9".
$ freesasa --format=rsa 3wbm.pdb
REM FreeSASA 2.0
REM Absolute and relative SASAs for 3wbm.pdb
REM Atomic radii and reference values for relative SASA: ProtOr
REM Chains: ABCDXY
REM Algorithm: Lee & Richards
REM Probe-radius: 1.40
REM Slices: 20
REM RES _ NUM All-atoms Total-Side Main-Chain Non-polar All polar
REM ABS REL ABS REL ABS REL ABS REL ABS REL
RES THR A 5 138.48 104.1 99.58 107.4 38.90 96.3 81.59 98.1 56.89 114.0
RES PRO A 6 25.53 19.3 11.31 11.0 14.23 47.7 21.67 18.7 3.86 23.9
...
RES GLY A 15 0.64 0.9 0.00 N/A 0.64 0.9 0.00 0.0 0.64 2.0
...
RES U Y 23 165.16 N/A 165.16 N/A 0.00 N/A 52.01 N/A 113.15 N/A
RES C Y 24 165.01 N/A 165.01 N/A 0.00 N/A 46.24 N/A 118.77 N/A
RES C Y 25 262.46 N/A 262.46 N/A 0.00 N/A 85.93 N/A 176.52 N/A
END Absolute sums over single chains surface
CHAIN 1 A 3785.5 3062.1 723.3 2051.6 1733.9
CHAIN 2 B 4342.3 3488.6 853.7 2385.2 1957.1
CHAIN 3 C 3961.1 3178.5 782.7 2122.4 1838.7
CHAIN 4 D 4904.3 3926.8 977.5 2572.0 2332.3
CHAIN 5 X 4156.5 4156.5 0.0 1236.9 2919.6
CHAIN 6 Y 4041.1 4041.1 0.0 1184.3 2856.8
END Absolute sums over all chains
TOTAL 25190.8 21853.6 3337.2 11552.4 13638.4
Note that each `RES` is a single residue, not a residue type as above
(i.e. has the same meaning as `SEQ` above). This unfortunate confusion
of labels is due to RSA support being added much later than the other
options. Fixing it now would break the interface, and will thus
earliest be dealt with in the next major release.
@subsubsection RSA-naccess Using the NACCESS configuration
The reference values for the NACCESS configuration in FreeSASA are not
exactly the same as those that ship with NACCESS, but have been
calculated from scratch using the tripeptides that ship with
FreeSASA. Calling
$ freesasa 3wbm.pdb --format=rsa --radii=naccess
will give an RSA file where the ABS columns should be identical to
NACCESS (if the latter is run with the flag `-b`). REL values will
differ slightly, due to the differences in reference values. NACCESS
also gives different results for the nucleic acid main-chain and
side-chain (possibly due to a bug in NACCESS?). FreeSASA defines the
(deoxy)ribose and phosphate groups as main-chain and the base as
side-chain.
@section CLI-select Selecting groups of atoms
The option `--select` can be used to define groups of atoms whose
integrated SASA we are interested in. It uses a subset of the Pymol
`select` command syntax, see @ref Selection for full
documentation. The following example shows how to calculate the sum of
exposed surface areas of all aromatic residues and of the four chains
A, B, C and D (just the sum of the areas above).
$ freesasa --select "aromatic, resn phe+tyr+trp+his+pro" --select "abcd, chain A+B+C+D" 3wbm.pdb
...
SELECTIONS
freesasa: warning: Found no matches to resn 'TRP', typo?
freesasa: warning: Found no matches to resn 'HIS', typo?
aromatic : 1196.45
abcd : 16993.24
The lines shown above are appended to the regular output. This
particular protein did not have any TRP or HIS residues, hence the
warnings (written to stderr). The warnings can be supressed with the
flag `-w`.
@section Chain-groups Analyzing groups of chains
Calculating the SASA of a given chain or group of chains separately
from the rest of the structure, can be useful for measuring how buried
a chain is in a given structure. The option `--chain-groups` can be
used to do such a separate calculation, calling
$ freesasa --chain-groups=ABCD+XY 3wbm.pdb
produces the regular output for the structure 3WBM, but in addition it
runs a separate calculation for the chains A, B, C and D as though X
and Y aren't in the structure, and vice versa:
PARAMETERS
algorithm : Lee & Richards
probe-radius : 1.400
threads : 2
slices : 20
####################
INPUT
source : 3wbm.pdb
chains : ABCDXY
atoms : 3714
RESULTS (A^2)
Total : 25190.77
Apolar : 11552.38
Polar : 13638.39
CHAIN A : 3785.49
CHAIN B : 4342.33
CHAIN C : 3961.12
CHAIN D : 4904.30
CHAIN X : 4156.46
CHAIN Y : 4041.08
####################
INPUT
source : 3wbm.pdb
chains : ABCD
atoms : 2664
RESULTS (A^2)
Total : 18202.78
Apolar : 9799.46
Polar : 8403.32
CHAIN A : 4243.12
CHAIN B : 4595.18
CHAIN C : 4427.11
CHAIN D : 4937.38
####################
INPUT
source : 3wbm.pdb
chains : XY
atoms : 1050
RESULTS (A^2)
Total : 9396.28
Apolar : 2743.09
Polar : 6653.19
CHAIN X : 4714.45
CHAIN Y : 4681.83
@section Input PDB input
@subsection Hetatom-hydrogen Including extra atoms
The user can ask to include hydrogen atoms and HETATM entries in the
calculation using the options `--hydrogen` and `--hetatm`. In both
cases adding unknown atoms will emit a warning for each atom. This can
either be amended by using the flag `-w` to suppress warnings, or by
using a custom classifier so that they are recognized (see @ref
Config-file).
@subsection Halt-skip Skipping unknown atoms
By default FreeSASA guesses the element of an unknown atom and uses
that elements VdW radius. If this fails the radius is set to 0 (and
hence the atom will not contribute to the calculated area). Users can
request to either skip unknown atoms completely (i.e. no guessing) or
to halt when unknown atoms are found and exit with an error. This is
done with the option `--unknown` which takes one of the three
arguments `skip`, `halt` or `guess` (default). Whenever an unknown
atom is skipped or its radius is guessed a warning is printed to
stderr.
@subsection Chains-models Separating and joining chains and models
If a PDB file has several chains and/or models, by default all chains
of the first model are used, and the rest of the file is ignored. This
behavior can be modified using the following options
- `--join-models`: Joins all models in the input into one large
structure. Useful for biological assembly files were different
locations of the same chain in the oligomer are represented by
different MODEL entries.
- `--separate-models`: Calculate SASA separately for each model in
the input. Useful when the same file contains several
conformations of the same molecule.
- `--separate-chains`: Calculate SASA separately for each chain in
the input. Can be joined with `--separate-models` to calculate
SASA of each chain in each model.
- `--chain-groups`: see @ref Chain-groups
@page API FreeSASA API
@section Basic-API Basics
The API is found in the header [freesasa.h](freesasa_8h.html). The
other source-files and headers in the repository are for internal use,
and are not presented here, but are documented in the source
itself. The file [example.c](example_8c_source.html) contains a simple
program that illustrates how to use the API to read a PDB file from
`stdin` and calculate and print the SASA.
To calculate the SASA of a structure, there are two main options:
1. Initialize a structure from a PDB-file, using either the default
classifier or a custom one to determine the radius of each atom,
and then run the calculation.
2. Provide an array of cartesian coordinates and an array containing
the radii of the corresponding atoms to freesasa\_calc\_coord().
@subsection API-PDB Calculate SASA for a PDB file
The following explains how to use FreeSASA to calculate the SASA of a
fictive PDB file (1abc.pdb). At each step one or more error checks
should have been done, but these are ignored here for brevity. See
the documentation of each function to see what errors can occur.
Default parameters are used at every step, the section @ref
Customizing explains how to configure the calculations.
@subsubsection API-Read-PDB Open PDB file
The function freesasa\_structure\_from\_pdb() reads the atom
coordinates from a PDB file and assigns a radius to each atom. The
third argument can be used to pass options for how to read the PDB
file.
~~~{.c}
FILE *fp = fopen("1abc.pdb");
const freesasa_classifier *classifier = &freesasa_default_classifier;
freesasa_structure *structure = freesasa_structure_from_pdb(fp, classifier, 0);
~~~
@subsubsection API-Calc Perform calculation and get total SASA
Next we use freesasa\_calc\_structure() to calculate SASA using the
structure we just generated, and then print the total area. The argument
`NULL` means use default freesasa_parameters.
~~~{.c}
freesasa_result *result = freesasa_calc_structure(structure, NULL);
printf("Total area: %f A2\n",result->total);
~~~
@subsubsection API-Classes Get polar and apolar area
We are commonly interested in the polar and apolar areas of a
molecule, this can be calculated by freesasa\_result\_classes(). To
get other classes of atoms we can either define our own classifier, or
use freesasa\_select\_area() defined in the next section. The return
type ::freesasa\_nodearea is a struct contains the total area and the
area of all apolar and polar atoms, and main-chain and side-chain
atoms.
~~~{.c}
freesasa_nodearea area = freesasa_result_classes(structure, result);
printf("Total : %f A2\n", area.total);
printf("Apolar : %f A2\n", area.apolar);
printf("Polar : %f A2\n", area.polar);
printf("Main-chain : %f A2\n", area.main_chain);
printf("Side-chain : %f A2\n", area.side_chain);
~~~
@see @ref Classification
@subsubsection API-Select Get area of custom groups of atoms
Groups of atoms can be defined using freesasa\_selection\_new(), which
takes a selection definition uses a subset of the Pymol select syntax
~~~{.c}
freesasa_selection *selection =
freesasa_selection_new("aromatic, resn phe+tyr+trp+his+pro",
structure, result);
printf("Area of selection '%s': %f A2\n",
freesasa_selection_name(selection), freesasa_selection_area(selection);
~~~
@see @ref Selection
@subsubsection structure-node Navigating the results as a tree
In addition to the flat array of results in ::freesasa\_result, and
the global values returned by freesasa\_result\_classes(), FreeSASA
has an interface for navigating the results as a tree. The leaf nodes
are individual atoms, and there are parent nodes at the residue,
chain, and structure levels. The function freesasa\_calc\_tree() does
a SASA calculation and returns the root node of such a tree. (If one
already has a ::freesasa\_result the function freesasa\_tree\_init()
can be used instead). Each node stores a ::freesasa\_nodearea for the
sum of all atoms belonging to the node. The tree can be traversed with
freesasa\_node\_children(), freesasa\_node\_parent() and
freesasa\_node\_next(), and the area, type and name using
freesasa\_node\_area(), freesasa\_node\_type() and
freesasa\_node\_name(). Additionally there are special properties for
each level of the tree.
@see node
@subsubsection export-tree Exporting to RSA, JSON and XML
The tree structure can also be exported to an RSA, JSON or XML file
using freesasa\_tree\_export(). The RSA format is fixed, but the user
can select which levels of the tree to include in JSON and XML. The
following illustrates how one would generate a tree and export it to
XML, including nodes for the whole structure, chains and residues (but
excluding individual atoms).
~~~~{.c}
freesasa_node *tree = freesasa_calc_tree(structure,
&freesasa_default_parameters,
&freesasa_default_classifier);
FILE *file = fopen("output.xml", "w");
freesasa_tree_export(file, tree, FREESASA_XML | FREESASA_OUTPUT_RESIDUE);
fclose(file);
freesasa_node_free(tree);
~~~~
@subsection Coordinates
If users wish to supply their own coordinates and radii, these are
accepted as arrays of doubles passed to the function
freesasa\_calc\_coord(). The coordinate-array should have size 3*n with
coordinates in the order `x1,y1,z1,x2,y2,z2,...,xn,yn,zn`.
~~~{.c}
double coord[] = {1.0, /* x */
2.0, /* y */
3.0 /* z */ };
double radius[] = {2.0};
int n_atoms = 1;
freesasa_result *result = freesasa_calc_coord(coord, radius, n_atoms, NULL);
~~~
@subsection Error-handling
The principle for error handling is that unpredictable errors should
not cause a crash, but rather allow the user to exit gracefully or
make another attempt. Therefore, errors due to user or system
failures, such as faulty parameters, malformatted config-files, I/O
errors or out of memory errors, are reported through return values,
either ::FREESASA\_FAIL or ::FREESASA\_WARN, or by `NULL` pointers,
depending on the context (see the documentation for the individual
functions).
Errors that are attributable to programmers using the library, such as
passing null pointers where not allowed, are checked by asserts.
@subsection Thread-safety
The only global state the library stores is the verbosity level (set
by freesasa\_set\_verbosity()) and the pointer to the error-log
(defaults to `stderr`, can be changed by freesasa\_set\_err\_out()).
It should be clear from the documentation when the other functions
have side effects such as memory allocation and I/O, and thread-safety
should generally not be an issue (to the extent that your C library
has threadsafe I/O and dynamic memory allocation). The SASA
calculation itself can be parallelized by using a
::freesasa\_parameters struct with ::freesasa\_parameters.n\_threads
\> 1 (default is 2) where appropriate. This only gives a significant
effect on performance for large proteins or at high precision, and
because not all steps are parallelized it is usually not worth it to
go beyond 2 threads.
@section Customizing Customizing behavior
The types ::freesasa\_parameters and ::freesasa\_classifier can be
used to change the parameters of the calculations. Users who wish to
use the defaults can pass `NULL` wherever pointers to these are
requested.
@subsection Parameters Parameters
Calculation parameters can be stored in a ::freesasa\_parameters
object. It can be initialized to default by
~~~{.c}
freesasa_parameters param = freesasa_default_parameters;
~~~
The following code would run a high precision Shrake & Rupley
calculation with 10000 test points on the provided structure.
~~~{.c}
freesasa_parameters param = freesasa_default_parameters;
param.alg = FREESASA_SHRAKE_RUPLEY;
param.shrake_rupley_n_points = 10000;
freesasa_result *result = freesasa_calc_structure(structure, param);
~~~
@subsection Classification Specifying atomic radii and classes
Classifiers are used to determine which atoms are polar or apolar, and
to specify atomic radii. In addition the three standard classifiers
(see below) have reference values for the maximum areas of the 20
standard amino acids which can be used to calculate relative areas of
residues, as in the RSA output.
The default classifier is available through the const variable
::freesasa\_default\_classifier. This uses the *ProtOr* radii, defined
in the paper by Tsai et
al. ([JMB 1999, 290: 253](http://www.ncbi.nlm.nih.gov/pubmed/10388571))
for the standard amino acids (20 regular plus SEC, PYL, ASX and GLX),
for some capping groups (ACE/NH2) and the standard nucleic acids. If
the element can't be determined or is unknown, a zero radius is
assigned. It classes all carbons as *apolar* and all other known atoms
as *polar*.
Early versions of FreeSASA used the atomic radii by Ooi et
al. ([PNAS 1987, 84: 3086-3090](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC304812/)),
this classifier is still available through ::freesasa_oons_classifier.
Users can provide their own classifiers through @ref Config-file. At
the moment these do not allow the user to specify reference values to
calculate relative SASA values for RSA output.
The default behavior of freesasa_structure_from_pdb(),
freesasa_structure_array(), freesasa_structure_add_atom() and
freesasa_structure_add_atom_wopt() is to first try the provided
classifier and then guess the radius if necessary (emitting warnings
if this is done, uses VdW radii defined by [Mantina et al. J Phys Chem
2009, 113:5806](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3658832/)).
See the documentation for these functions for what parameters to use
to change the default behavior.
@page Config-file Classifier configuration files
The configuration files read by freesasa_classifier_from_file() or the
command-line option `-c` should have two sections: `types:` and
`atoms:`, and optionally the section `name:`.
The types-section defines what types of atoms are available
(aliphatic, aromatic, hydroxyl, ...), what the radius of that type is
and what class a type belongs to ('polar' or 'apolar', case
insensitive). The types are just shorthands to associate an atom with
a given combination of class and radius. The user is free to define as
many types and classes as necessary.
The atoms-section consists of triplets of residue-name, atom-name (as
in the corresponding PDB entries) and type. A prototype file would be
~~~
name: myclassifier # tag and value must be on the same line (optional)
types:
C_ALIPHATIC 2.00 apolar
C_AROMATIC 1.75 apolar
N 1.55 polar
atoms:
ANY N N
ANY CA C_ALIPHATIC
ANY CB C_ALIPHATIC
ARG CG C_ALIPHATIC
PRO CB C_AROMATIC # overrides ANY CB
~~~
The residue type `ANY` can be used for atoms that are the same in all
or most residues (such as backbone atoms). If there is an exception
for a given amino acid this can be overridden as is shown for `PRO CB`
in the example.
A few example configurations are available in the directory
[share/](https://github.com/mittinatten/freesasa/tree/master/share). The
configuration-file
[protor.config](https://github.com/mittinatten/freesasa/tree/master/share/protor.config)
is a copy of the default classifier, and can be used to add extra
atoms that need to be classified, while keeping the defaults for the
standard residues (also see the file
[scripts/chemcomp2config.pl](https://github.com/mittinatten/freesasa/tree/master/scripts/)
for instructions on how to generate configurations for new chemical
components semi-automatically). If something common is missing in the
default classifier, [create an
issue](https://github.com/mittinatten/freesasa/issues) on Github so
that it can be added.
FreeSASA also ships with some configuration-files that mimic other
popular programs, such as
[NACCESS](https://github.com/mittinatten/freesasa/tree/master/share/naccess.config)
and
[DSSP](https://github.com/mittinatten/freesasa/tree/master/share/dssp.config).
The static classifiers in the API were generated using
[scripts/config2c.pl](https://github.com/mittinatten/freesasa/tree/master/scripts/)
to convert the correspoding configurations in `share` to C code.
@page Selection Selection syntax
FreeSASA uses a subset of the Pymol select commands to give users an
easy way of summing up the SASA of groups of atoms. This is done by
the function freesasa\_selection\_new() in the C API,
freesasa.selectArea() in the Python interface and the option
`--select` for the command line tool. All commands are case
insensitive. A basic selection has a selection name, a property
selector and a list of arguments
<selection-name>, <selector> <list>
For example
aromatic, resn phe+tyr+trp+his+pro
Several selectors can be joined using boolean logic and parentheses,
<selection-name>, (<s1> <l1>) and not (<s2> <l2> or <s3> <l3>)
where s1, s2 and s3 are selectors and l1, l2 and l3 are lists. The
operator `and` has precedence over `or`, so the second parentheses is
necessary but not the first, in the example above. The selection name
can include letters, numbers and underscores. The name can't be longer
than ::FREESASA\_MAX\_SELECTION\_NAME characters.
The following property selectors are supported
- `resn` Residue names like "ala", "arg", "du", etc
- `resi` Residue index (positive or negative integers)
- `chain` Chain labels (single characters)
- `name` Atom names, such as "ca", "c", "oxt", etc
- `symbol` Element symbols, such as "C", "O", "Se", "Fe", etc.
A list of residues can be selected using
resn ala+val+leu+ile+met
and similarly for the other four selectors. In addition `resi` and
`chain` support ranges
resi 1-10 (residues 1 to 10)
resi -10 (residues indices < 10)
resi 10- (residues indices > 10)
resi 1-10+20-30+35- (residues 1 to 10, 20 to 30 and above 35)
resi \-20-\-15+\-10-5 (residues -20 to -15 and -10 to 5)
chain A+C-E (chains A and C to E, no open intervals allowed here)
Combining ranges with plus signs, as in the three last lines, is not
allowed in Pymol but supported by FreeSASA.
If a selection list contains elements not found in the molecule that
is analyzed, a warning is printed and that part of the list does not
contribute to the selection. Not finding a list element can be because
it specifies a residue that does not exist in the particular molecule,
or because of typos. The selector does not keep a list of valid
elements, residue names, etc.
@page Python Python interface
If Python is enabled using
$ ./configure --enable-python-bindings
Cython is used to build Python bindings for FreeSASA, and `make
install` will install them. The option `--with-python=...` can be
specified to specify which Python binary to use.
Below follow some illustrations of how to use the package, see the
@ref freesasa "package documentation" for details.
@section Python-basics Basic calculations
Using defaults everywhere a simple calculation can be carried out as
follows (assuming the PDB structure 1UBQ is available)
~~~{.py}
import freesasa
structure = freesasa.Structure("1ubq.pdb")
result = freesasa.calc(structure)
area_classes = freesasa.classifyResults(result, structure)
print "Total : %.2f A2" % result.totalArea()
for key in area_classes:
print key, ": %.2f A2" % area_classes[key]
~~~
Which would give the following output
Total : 4804.06 A2
Polar : 2504.22 A2
Apolar : 2299.84 A2
The following does a high precision L&R calculation
~~~{.py}
result = freesasa.calc(structure,
freesasa.Parameters({'algorithm' : freesasa.LeeRichards,
'n-slices' : 100}))
~~~
Using the results from a calculation we can also integrate SASA over a selection of
atoms, using a subset of the Pymol selection syntax (see @ref Selection):
~~~{.py}
selections = freesasa.selectArea(('alanine, resn ala', 'r1_10, resi 1-10'),
structure, result)
for key in selections:
print key, ": %.2f A2" % selections[key]
~~~
which gives the output
alanine : 120.08 A2
r1_10 : 634.31 A2
@section Python-classification Customizing atom classification
This uses the NACCESS parameters (the file 'naccess.config' is
available in the share/ directory of the repository).
~~~{.py}
classifier = freesasa.Classifier("naccess.config")
structure = freesasa.Structure("1ubq.pdb", classifier)
result = freesasa.calc(structure)
area_classes = freesasa.classifyResults(result, structure, classifier)
~~~
Classification can be customized also by extending the Classifier
interface. The code below is an illustration of a classifier that
classes Nitrogens separately, and assigns radii based on element only
(and crudely).
~~~{.py}
import freesasa
import re
class DerivedClassifier(Classifier):
def classify(self, residueName, atomName):
if re.match('\s*N', atomName):
return 'Nitrogen'
return 'Not-nitrogen'
def radius(self, residueName, atomName):
if re.match('\s*N',atomName): # Nitrogen
return 1.6
if re.match('\s*C',atomName): # Carbon
return 1.7
if re.match('\s*O',atomName): # Oxygen
return 1.4
if re.match('\s*S',atomName): # Sulfur
return 1.8
return 0; # everything else (Hydrogen, etc)
classifier = DerivedClassifier()
# use the DerivedClassifier to calculate atom radii
structure = freesasa.Structure("1ubq.pdb", classifier)
result = freesasa.calc(structure)
# use the DerivedClassifier to classify atoms
area_classes = freesasa.classifyResults(result,structure,classifier)
~~~
Of course, this example is somewhat contrived, if we only want the
integrated area of Nitrogen atoms, the simpler choice would be
~~~{.py}
selection = freesasa.selectArea('nitrogen, symbol n', structure, result)
~~~
However, extending freesasa.Classifier, as illustrated above, allows
classification to arbitrary complexity and also lets us redefine the
radii used in the calculation.
@section BioPDB Bio.PDB
FreeSASA can also calculate the SASA of a Bio.PDB structure
~~~{.py}
from Bio.PDB import PDBParser
parser = PDBParser()
structure = parser.get_structure("Ubiquitin", "1ubq.pdb")
result, sasa_classes = freesasa.calcBioPDB(structure)
~~~
If one needs more control over the analysis the structure can be
converted to a freesasa.Structure using freesasa.structureFromBioPDB()
and the calculation can be performed the normal way using this
structure.
@page Geometry Geometry of Lee & Richards' algorithm
This page explains the geometry of the calculations in L&R
and can be used to understand the source code. As far as possible the
code uses similar notation to the formulas here.
We will use the following notation: An atom \f$i\f$ has a van der
Waals radius \f$r_i\f$, the rolling sphere (or *probe*) has radius
\f$r_\text{P}\f$ and when these are added we get an extended radius
\f$R_i = r_i + r_\text{P}\f$. The sphere of radius \f$R_i\f$ centered
at the position of atom \f$i\f$ represents the volume not accessible
to the center of the probe. The SASA for a molecule is then obtained
by calculating the non-buried surface area of the extended spheres.
The L&R algorithm calculates the surface area by slicing the
protein, calculating the length of the solvent exposed contours in
each slice and then adding up the length multiplied by the slice
thickness.
![Slice in atom](../fig/lnr_slice.svg)
Divide atom \f$i\f$ into \f$n\f$ slices, orthogonal to an arbitrary
axis, of thickness \f$\delta = 2R_i/n\f$. The position of the middle
of the slice along that axis is denoted \f$z\f$, and the center of
atom \f$i\f$, along the same axis, is at \f$z_i\f$. In each slice, the
atom is thus a circle of radius \f[R_i^\prime =
\sqrt{R_i^2-(z-z_i)^2}\,.\f] These circles are either completely
buried inside neighboring atoms, completely exposed, or partially
exposed.
![Overlap of circles](../fig/lnr_circles.svg)
The exposed arc lengths for each atom can be calculated exactly. For
each pair of atoms \f$i,j\f$, the distance between their centers
projected on the slice is \f$d_{ij}\f$ (independent of \f$z\f$). If
\f$d_{ij} > R_i^\prime + R_j^\prime\f$, there is no overlap. If
\f$d_{ij} < R_j^\prime - R_i^\prime\f$ circle \f$i\f$ is completely
inside \f$j\f$ (and the other way around). If \f$d_{ij}\f$ lies
between these two cases the angle of circle \f$i\f$ that is buried due
to circle \f$j\f$ is
\f[ \alpha = 2\arccos \bigl[({R_i^\prime}^2_{\,}
+ d_{ij}^2 - {R_{j}^\prime}^2_{\,})/(2R_i^\prime d_{ij})\bigr]. \f]
If the middle point of this arc on the circle is at an angle
\f$\beta\f$, the arc spans the interval
\f$[\beta-\alpha/2,\beta+\alpha/2]\f$. By adding up these arcs and
taking into account any overlap between them we get the total buried
angle \f$\gamma\f$ in this slices. The exposed arc angle for this atom
and slice is thus \f$2\pi-\gamma\f$ and the total SASA of that atom
\f[ A_i =R_i \delta \!\! \sum_{s\in\text{slices}} \!\!
\left[2\pi-\gamma_s\right]\,. \f]
The angle is multiplied by \f$R_i\f$ (not \f$R_i^\prime\f$) to give
the area of a conical frustum circumscribing the sphere at the
slice. Finally, the total area \f$A\f$ is the sum of all \f$A_i\f$.
In FreeSASA, the L\&R SASA calculation begins by finding overlapping
spheres and storing the contacts in an adjacency list. It then
iterates through all the slices of each atom and checks for overlap
with adjacent atoms in each slice, and adds up the exposed arcs to
calculate the atom's contribution to the SASA of the slice. The
calculations for each atom are completely independent and can thus be
parallelized over an arbitrary number of threads, whereas the
calculation of adjacency lists has not been parallelized.