HHpred / HHsearch
From Bioinformatics.Org Wiki
HHsearch is an open-source software program for protein sequence searching that is part of the free HH-suite software package. HHpred is a free protein function and protein structure prediction server that is based on HHsearch and HHblits, another program in the HH-suite package. HHpred and HHsearch are among the most popular methods for protein structure prediction and the detection of remotely related sequences, each having been cited over 500 times.
Sequence searches are frequently performed by biologists to infer the function of an unknown protein from its sequence. For this purpose, the protein's sequence is compared to the sequences of other proteins in public databases and its function is deduced from those of the most similar sequences. Often, no sequences with annotated functions can be found in such a search. In this case, more sensitive methods are required to identify more remotely related proteins or protein families. From these relationships, hypotheses about the protein's functions, Protein structure structure, and domain composition can be inferred. HHsearch performs searches with a protein sequence through databases. The HHpred server and the HH-suite software package offer many popular, regularly updated databases, such as the Protein Data Bank, as well as the InterPro, Pfam, Clusters of Orthologous Groups (COG), and SCOP databases.
HHpred and HHsearch belong to the class of profile-profile comparison tools, which includes the most sensitive sequence search methods to date. They represent both the query sequence and the database sequences by sequence profiles, also called position-specific scoring matrices (PSSMs). Profiles are calculated from a multiple sequence alignment of related sequences which are collected, for example, using the PSI-BLAST program or the HHblits program from the HH-suite package. A profile is a matrix containing for each position in the query sequence the similarity score for the 20 amino acids. These scores are calculated from the frequencies of the amino acids at the corresponding positions in the multiple sequence alignment. Because profiles contain much more information than a single sequence (e.g. the position-specific degree of conservation), profile-profile comparison methods are much more powerful than sequence-sequence comparison methods like BLAST or profile-sequence comparison methods like PSI-BLAST.
HHpred and HHsearch represent query and database proteins by profile hidden Markov models (HMMs), an extension of sequence profiles which also record position-specific amino acid insertion and deletion frequencies. HHsearch searches a database of HMMs with a query HMM. Before starting the search through the actual database of HMMs, HHsearch/HHpred builds a multiple sequence alignment of related sequences using the HHblits program from the HH-suite package. From this alignment, a profile HMM is calculated. The databases contain HMMs that are precalculated in the same fashion using PSI-BLAST. The output of HHpred and HHsearch is a ranked list of database matches (including E-values and probabilities for a true relationship) and the pairwise query-database sequence alignments. A search through the PDB database of proteins with solved 3D structure takes a few minutes. If a significant match with a protein of known structure (a "template") is found in the PDB database, HHpred allows the user to build a homology model using the MODELLER software, starting from the pairwise query-template alignment.
Applications of HHpred and HHsearch include protein structure prediction, function prediction, domain prediction, domain boundary prediction, and evolutionary classification of proteins.
HHpred servers have been ranked among the best servers during the last three CASP blind protein structure prediction experiments. In the last CASP, CASP9, HHpredA, B, and C were ranked 1st, 2nd, and 3rd out of 81 participating automatic structure prediction servers in template-based modeling and 6th, 7th, 8th on all 147 targets, while being much faster than the best 20 servers. In CASP8, HHpred was ranked 7th on all targets and 2nd on the subset of single domain proteins, while still being more than 50 times faster than the top-ranked servers.
- HH-suite software package
- Position-specific scoring matrix
- Multiple sequence alignment
- CASP - Critical Assessment of Techniques for Protein Structure Prediction
- BLAST (Basic Local Alignment Search Tool)
- Context-specific BLAST (CS-BLAST)
- ↑ 1.0 1.1 2005. Protein homology detection by HMM-HMM comparison Bioinformatics 21(7):951–960. ()
- ↑ 2005. The HHpred interactive server for protein homology detection and structure prediction Nucleic Acids Research 33((Web Server issue)):W244–248. ()
- ↑ Number of results returned from a search on Google Scholar. (Google Scholar search)
- ↑ 4.0 4.1 2000. Improving the quality of twilight-zone alignments Protein Science 9(8):1487–1496. ()
- ↑ 2003. Profile–profile comparisons by COMPASS predict intricate homologies between protein families Protein Science 12(10):2262–2272. ()
- ↑ 2006. Sequence comparison and protein structure prediction Current Opinion in Structural Biology 16(3):374–384. ()
- ↑ Official CASP9 results for the template-based modeling category (121 targets)
- ↑ Official CASP9 results for all 147 targets
- ↑ 2009. Fast and accurate automatic structure prediction with HHpred Proteins 77 Suppl 9():128–32. ()