Accesible Co-Evolution Suite




- Scroll down or select an option on the left menu. Hosted by Bioinformatics.org -
- Site under construction! -

Accessible Co-Evolution Suite of Software Tools



Welcome to the homepage for the Accessible Co-Evolution Suite group. Here you can find a suite of software tools to facilitate analysis of co-evolution within or between proteins and nucleic acids. These tools are provided for academic use and are open source, to enable optimization or customization for your particular research question. You can read more about each tool below (Pipeline Description) or select the .zip package to download the tool from the menu bar on the left. Each package contains an executable .JAR file, the source .JAVA file, and documentation such as a readme file (.TXT), user manual (.PDF), and example data.

In addition to the available software, this site also contains other helpful resources for anybody interested in carrying out co-evolution analysis. These can be found both on this page and on the menu to the left. In the future, web-server versions of some of the tools will also be available.

Molecular Co-Evolution



Interactions between residues and subunits of molecules such as proteins and nucleic acids help govern many important cellular behaviors. Thus, to understand these behaviors, and to gain the insight necessary to modify and engineer them, we must ultimately understand the aforementioned molecular mechanisms and interactions. Highly conserved residues are often important for molecular function, but these are mostly invariant among organisms and cannot be easily reconfigured. However, residues that appear poorly conserved may still be critical for molecule function, if the evolutionary changes that occur in that molecule are compensated by simulatenous changes in the salient interacting molecule (or, for molecule structure, within the same molecule). Two residues that display this pattern are said to co-evolve.

Identification of co-evolving residue pairs is significant and worthwhile. These residue pairs would be predicted to form functionally-important, sequence specific interactions (Of course, the bioinformatic predictions must be validated by experiment). Unlike highly conserved residues, these pairs can be reconfigured to generate new interactions or new cellular behaviors, or to develop orthogonal sets of interactions in an organism.



Co-evolution can be detected by measuring the co-variation between two residues in one or more sets of multiple sequence alignments (MSAs). This measurement is made by calculating the mutual information for each pair. Mutual Information represents the degree to which the identify of a particular pair is a coincidence, given the frequency of that identity for each member of the pair separately.

Pipeline for Co-evolution Analysis



The ACeS suite of tools together help form a pipeline for analysis of co-evolution for any protein or nucleic acid(s). Depending on your research question, you may need one or more of these tools in sequence. These includes tools for automated retrieval of sequences, propering formatting of sequences for downstream applications, and filtering of unwanted sequences based on a number of criteria. Not yet included in the suite is a program for aligning the final sets of sequences; these tools are available elsewhere. Finally, aligned sequences can be analyzed to determine the co-evolution for all residue pairs within and between them. This pipeline is diagrammed in the figure below.

Here is a brief description of each tool in the pipeline:

Nucleotide Fetch
Given a DocSum file (which can be obtained via an NCBI eBOT), this program will access the nucleotides sequences from NCBI and write them to a destination file.

Sequence Numbering
Given a FASTA file, number each sequence to avoid duplicate headers.

USCS tRNA Reformat
This is a tool specific for tRNA sequences from a USCS database (GtRNAdb); these tools will, either individually or in batch, convert the USCS tRNA sequence files into a format consistent with the rest of the pipeline.

Sequence Name Filter
Matches search terms to the description field of gene (protein or nucleotide) sequences, removing sequences that do not match. Used to eliminate unwanted sequences retrieved via automatic means.

Combine FASTA
This java program allows several FASTA files, each containing one or more sequences, to be combined without duplication

Taxonomy Filter
Given two FASTA files, this module will filter each so that only entries representing a taxonomic group present in both will be kept; others will be put in a discard pile.

MSA Simulator
Generates simulated multiple sequence alignments (MSA) given parameters such as number of sequences, sequence length, identiy or group conservation and both intra- and inter-molecular co-evolution.

Mutual Information of Biological Molecules
Determines intra- and inter-molecular mutual information within and between one or more proteins or nucleic acid MSAs, generating both raw and processed data and heatmaps.

Research Group



This research group was founded by Devin Camenares, an Assistant Professor in the Dept. of Biological Sciences at Kingsborough Community College. This was a project born out of necessity; during his time as a graduate student in the lab of Dr. Wali Karzai at Stony Brook University, Dr. Camenares was interested in determining the amount of co-evolution between SmpB and tmRNA. Literature search uncovered examples where intramolecular protein co-evolution had been studied, and even found a study that focused on intermolecular co-evolution between protein and RNA. However, most of these papers did not make their software or methods readily available. Despite no formal training in programming, Dr. Camenares learnt enough of several languages to eventually develop the first set of tools available here.

Current Members:

Devin Camenares, Assistant Professor, Dept. of Biological Sciences, Kingsborough Community College CUNY. Contact via email, follow on Twitter, Research Gate, or view CV.

Acknowledgements:

A special thanks to Christopher Camenares, who helped guide some of the programming efforts.

We welcome any interest in collaborations with other researchers, programmers, or bioinformatic enthusiasts. If you feel passionate about advancing co-evolution analysis, and making these tools available to the general scientific community, we would love to work with you!

Our Publications and Citations



This section will feature any publications generated directly or indirectly by the work of the group. If you used, either wholly or in part, any of the tools featured here (either in original form or modified), please contact Professor Camenares so that your work can be featured here as well.



External Websites and Tools



  • Clustal Omega: A tool for aligning multiple sequences together. An improved version of the software known as Clustal-W, available from the same site. Made available by EMBL-EBI.

  • MISTIC Mutual Information Server to Infer Co-evolutio, a web-based tools that performs mutual information analysis for intramolecular co-evolution of proteins.

Relevant Publications



Below are publications from authors unconnected to this group that are still relevant to co-evolution analysis.

[1] C.M. Buslje, E. Teppa, T. Di Doménico, J.M. Delfino, M. Nielsen, Networks of high mutual information define the structural proximity of catalytic sites: Implications for catalytic residue identification, PLoS Comput. Biol. 6 (2010). doi:10.1371/journal.pcbi.1000978.

[2] A. Chatterjee, H. Xiao, P.G. Schultz, Evolution of multiple, mutually orthogonal prolyl-tRNA synthetase/tRNA pairs for unnatural amino acid mutagenesis in Escherichia coli, Proc. Natl. Acad. Sci. 109 (2012) 14841–14846.

[3] J. Hummel, N. Keshvari, W. Weckwerth, J. Selbig, Species-specific analysis of protein sequence motifs using mutual information., BMC Bioinformatics. 6 (2005) 164. doi:10.1186/1471-2105-6-164.

[4] F.L. Simonetti, E. Teppa, A. Chernomoretz, M. Nielsen, C. Marino Buslje, MISTIC: Mutual information server to infer coevolution., Nucleic Acids Res. 41 (2013). doi:10.1093/nar/gkt427.

[5] L.C. Martin, G.B. Gloor, S.D. Dunn, L.M. Wahl, Using information theory to search for co-evolving residues in proteins., Bioinformatics. 21 (2005) 4116–24. doi:10.1093/bioinformatics/bti671.

[6] C.M. Buslje, J. Santos, J.M. Delfino, M. Nielsen, Correction for phylogeny, small number of observations and data redundancy improves the identification of coevolving amino acid pairs using mutual information, Bioinformatics. 25 (2009) 1125–1131. doi:10.1093/bioinformatics/btp135.

[7] W. Mao, C. Kaya, A. Dutta, A. Horovitz, I. Bahar, Comparative study of the effectiveness and limitations of current methods for detecting sequence coevolution., Bioinformatics. 31 (2015) 1929–37. doi:10.1093/bioinformatics/btv103.