CAT-O: conCATenate Orthologous sequences
Home | About | Tutorial | Contact
 
 
ABOUT
 
 

The sequences with common evolutionary origin provide useful information regarding relationship among organisms. It has been proven that the genome of an organism retain ample evolutionary signals to construct phylogenetic tree (Brown et al. 2001). However, it is unlikely that any gene/protein alone will ever be able to reconstruct robust universal trees (Baldauf et al. 2000). Therefore, combined sequences have been used to get strong support in phylogenetic analysis. Researchers have to concatenate sequences either manually which is a tedious process or write their own programs, not feasible for non-specialist. Therefore, we have developed CAT-O, an online server, to identify orthologous, unique and duplicated gene/protein sequences in multiple organelle (chloroplast/mitochondria) genomes. The identified orthologs can be used to generate concatenated sequences dataset to be used in phylogenetic tree reconstruction.

CGI-PERL has been used for server side while HTML and JavaScript for the client side programming. CAT-O uses Basic Local Alignment Search Tool (BLAST; Altschul et al. 1997) to compare sequences with specified parameters and bidirectional best hit (BBH; Overbeek et al. 1999) method to identify orthologs. Multiple sequence alignment of selected orthologs can be generated using MUSCLE (Edgar 2004). Feature has been provided to combine orthologous sequences from multiple organisms to generate a concatenated sequence dataset which can be used for the phylogenetic tree reconstruction. Additionally, unique and duplicated sequences in an organism can be identified.

 
References
 

Brown JR, Douady CJ, Italia MJ, Marshall WE, Stanhope MJ (2001) Universal trees based on large combined protein sequence data sets. Nat Genet 28:281-285.

Baldauf SL, Roger AJ, Wenk-Siefert I, Doolittle WF (2000) A kingdom-level phylogeny of eukaryotes based on combined protein data. Science 290:972-977.

Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389-3402.

Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792-1797.

Overbeek R, Fonstein M, D'Souza M, Pusch GD, Maltsev N (1999) The use of gene clusters to infer functional coupling. Proc Natl Acad Sci USA 96:2896-2901.

 
 
For best view 1024 x 768 resolution & IE 6.0 or above recommended.