Searching for structurally similar proteins

PACKAGE:charite.christo.strap. The fold of a protein is much more conserved in evolution than the amino acid sequence. Therefore it is possible to identify remote homologs for a given protein by looking for structures with the same fold, while distantly related proteins are often missed by sequence search techniques such as BLAST. The alignment of a protein with homologous proteins provides important clues for function and evolution. For example it allows parts of the protein that are common in all members of the protein family to be distinguished from those parts that are unique in that protein. Highly conserved protein regions are often functionally important. Unlike sequence alignment techniques, 3D alignments correctly match functionally important residues such as active site residues (marked green in the alignment).

Computational steps

For a given PDB structure the following steps are performed automatically.
  1. The protein chains of the reference protein are separated and clustered. Structurally highly similar chains are grouped together and one representative from the group is used for the following steps.
  2. Lists of structurally similar proteins are retrieved from four different Web-services. These four servers apply different superposition methods to different nonredundant protein sets. The user will therefore find more interesting remote homologs than if he used only one method. The databases are updated periodically at different times. Very recent PDB-files (< 6 Months) may not have been included yet. Similarity is generally based on structural similarity irrespectively of the amino acid sequence.
  3. Initially, the first few hits are displayed. If desired, the user can add or remove structures to the alignment by activating or deactivating check-boxes.
  4. The selected similar chains are loaded from the PDB and aligned to the reference protein. The alignment is performed by superimposing all 3D-structures with each other using TM-align (PUBMED:15849316).
  5. Finally, all resulting pair alignments are assembled to one multiple sequence alignment which is shown in the lower part of the screen. The alignment can be exported to the word-processor program such as MS_Word by pressing BUTTON:"<i><b><font color=0000FF>W</font></b></i>". The superimposed 3D models are shown in the 3D view.

The result pane

The search result is summarised in the result pane. The tab of this pane is found at the left. In the headline you find the current PDB-ID consisting of four letters and digits. You can change this ID to start the computation with a different reference protein.

Multimeric proteins

The protein under consideration may be multimeric, i.e. it may have more than one peptide chain. Chains in one protein may be identical if they have the same amino acid sequence or they may have a similar 3D structure dispite different amino acid sequence or they may have completely different folds. Chain identifiers are capital letters or digits. By convention a particular chain is designated by appending a colon and the chain identifier to the PDB ID. To avoid that redundant results are shown the following measures are taken for proteins with more than one chain: Similar subunits are grouped together whereas completely different subunits form a group on their own. For each group a different result pane is used. The tabs of the result panes are located at the bottom and are labeled with the chain identifiers of the query protein. If two subunits have exactly the same amino acid this is indicated by writing their chain IDs tightly together in the tab. For each group one representative protein chain is determined. This representative is used for all subsequent steps as described above.

Computation speed

3D-superposition is time consuming. It is conducted by the program TM_align in the directory STRING:ChUtils#dirBinaries(). The Fortran source has the file ending ".f". You can enhance speed by WIKI:compiling this program with a comercial Fortran to generate an WIKI:Executable specially tailored for your CPU.
Macintosh On Intel Macintosh the speed could be even more improved since currently STRAP uses the PPC version also for Intel-Macintosh. Here are some links which might help to install a compiler for Intel-Macintosh: