Searching for structurally similar proteins
PACKAGE:charite.christo.strap.
The fold of a protein is much more conserved in evolution than the
amino acid sequence.
Therefore it is possible to identify remote homologs for a given protein by looking for structures
with the same fold, while distantly related proteins are often missed by sequence
search techniques such as BLAST.
The alignment of a protein with homologous
proteins provides important clues for function and evolution.
For example it allows parts of the protein that are
common in all members of the protein family to be distinguished from those parts that are
unique in that protein.
Highly conserved protein regions are often functionally important.
Unlike sequence alignment techniques, 3D alignments correctly match functionally important
residues such as active site residues (marked green in the alignment).
Computational steps
For a given PDB structure the following steps are performed
automatically.
-
The protein chains of the reference protein are separated and clustered.
Structurally highly similar chains are grouped together and one
representative from the group is used for the following steps.
-
Lists of structurally similar proteins are retrieved from
four different Web-services.
- GangstaPlus* PUBMED:17118190
- CE_CL* PUBMED:11125099 PUBMED:9796821
- Dali* PUBMED:8578593
- NCBI-VAST* PUBMED:8804824
These four servers apply different superposition methods to different
nonredundant protein sets.
The user will therefore find more interesting remote homologs than if he used only one method.
The databases are updated periodically at different times. Very recent
PDB-files (< 6 Months) may not have been included yet.
Similarity is generally based on structural similarity
irrespectively of the amino acid sequence.
- Initially, the first few hits are displayed. If desired, the user
can add or remove structures to the alignment by activating or
deactivating check-boxes.
- The selected similar chains are loaded from the PDB and aligned to the reference protein.
The alignment is performed by superimposing all 3D-structures with
each other using TM-align (PUBMED:15849316).
- Finally, all resulting pair alignments are assembled to one
multiple sequence alignment which is shown in the lower part of the
screen. The alignment can be exported to the word-processor program such as MS_Word
by pressing
BUTTON:"<i><b><font color=0000FF>W</font></b></i>".
The superimposed 3D models are shown in the 3D view.
The result pane
The search result is summarised in the result pane.
The tab of this pane is found at the left.
In the headline you find the current PDB-ID consisting of four letters
and digits. You can change this ID to start the computation with a
different reference protein.
Multimeric proteins
The protein under consideration may be multimeric, i.e. it may have
more than one peptide chain.
Chains in one protein may be identical if they have the same amino
acid sequence or they may have a similar 3D structure dispite different amino acid sequence
or they may have
completely different folds.
Chain identifiers are capital letters or digits.
By convention a particular chain is designated by appending a colon
and the chain identifier to the PDB ID.
To avoid that redundant results are shown the following measures
are taken for proteins with more than one chain:
Similar subunits are grouped together whereas completely different
subunits form a group on their own.
For each group a different result pane is used.
The tabs of the result panes are located at the bottom and
are labeled with the chain identifiers of the query
protein.
If two subunits have exactly the same amino acid this is indicated by
writing their chain IDs tightly together in the tab.
For each group one representative protein chain is determined. This
representative is used for all subsequent steps as described above.
Computation speed
3D-superposition is time consuming. It is
conducted by the program TM_align in the
directory STRING:ChUtils#dirBinaries(). The Fortran source has the file ending ".f".
You can enhance speed by WIKI:compiling this program with a comercial Fortran
to generate an WIKI:Executable specially tailored for your CPU.
Macintosh
On Intel Macintosh the speed could be even more improved
since currently STRAP uses the PPC version also for Intel-Macintosh.
Here are some links which might help to install a compiler for Intel-Macintosh:
- http://ftp.g95.org/
- http://hpc.sourceforge.net/g95
- http://www.intel.com/cd/software/products/asmo-na/eng/compilers/fmac/267426.htm
- http://www-306.ibm.com/software/awdtools/fortran/xlfortran/features/xlf-mac.html.