BranchClust: A Phylogenetic Algorithm for Selecting Gene Families

BranchClust is an algorithm for the automated selection of orthologous genes that recognizes orthologous genes from different species in a phylogenetic tree for any number of taxa. The algorithm is capable of distinguishing complete (containing all taxa) and incomplete (not containing all taxa) families and recognizes in- and out-paralogs.

 

Maria S Poptsova  and J Peter Gogarten

BMC Bioinformatics 2007, 8:120

Free access: http://www.biomedcentral.com/1471-2105/8/120
 

 BranchClust Tutorial - a step-by-step guide for assembling orthologous gene families

 

 Algorithm

 

BranchClust is a clustering algorithm that parses trees in order to delineate families of orthologs within a superfamily containing several paralogous gene families. The underlying idea is that closely related genes are placed on one branch emerging from one node on a tree, so the task of detecting families for n different taxa is simply a task to detect branches containing groups of genes from all, or almost all, species.

more

 

 Clustering

 

 

Superfamily of ATP-synthases for 30 taxa: 16 bacteria and 14 archaea.

 

ATP-A designates all catalytic subunits, either from bacteria or from archaea, or subunit A, and ATP-B - all non-catalytic subunits as subunit B, ATP-F - flagellum specific ATP synthase.

 

BrunchClust output:

------------ CLUSTER -----------

56421917 16080761 15606215 21673103 39998198 39933373 15600432 32473454 32141261 62390087 21225334 55981034 15806355 15644219 32475544 15606716

------------ FAMILY ------------

15606215 16080761 21673103 62390087 15806355 56421917 39998198 15600432 32473454 39933373 32141261 15644219 55981034

INCOMPLETE: 13

>>>>> IN-PARALOGS -----------

21225334

<<<<< OUT-OF_CLUSTER PARALOGS -----------

32475544 15606716

 

------------ CLUSTER -----------

15644360 32476315 21674843 39995222 39933255 17227501 37522474 56421895 16080736 55820565 62390098 21223731 15606090 15600749 20091265 32473399

------------ FAMILY ------------

15606090 16080736 21674843 62390098 56421895 39995222 37522474 20091265 17227501 15600749 32476315 39933255 55820565 21223731 15644360

INCOMPLETE: 15

>>>>> IN-PARALOGS -----------

32473399

<<<<< OUT-OF_CLUSTER PARALOGS -----------

 

------------ CLUSTER -----------

20091272 21673859 32473392 15607015 15644358 15600747 37522139 17232531 21675043 39995224 39933253 32476317 56421893 16080734 55820567 62390100 21223733

------------ FAMILY ------------

15607015 16080734 21673859 62390100 56421893 39995224 37522139 20091272 17232531 15600747 32473392 39933253 55820567 21223733 15644358

INCOMPLETE: 15

>>>>> IN-PARALOGS -----------

21675043 32476317

<<<<< OUT-OF_CLUSTER PARALOGS -----------

 

------------ CLUSTER -----------

55981241 15805728 16081191 14521959 57641538 20095109 15678972 45358608 15790972 55379722 11498767 20092952 14600684 15897485 18312435 41615057

------------ FAMILY ------------

14600684 11498767 15805728 55379722 15790972 45358608 20095109 20092952 15678972 41615057 18312435 14521959 15897485 57641538 16081191 55981241

INCOMPLETE: 16

 

------------ CLUSTER -----------

11498766 20092951 15790973 55379721 16081190 55981242 15805727 20094453 14521960 57641537 45358607 15678973 14600685 15897484 41614899 18312083

------------ FAMILY ------------

14600685 11498766 15805727 55379721 15790973 45358607 20094453 20092951 15678973 41614899 18312083 14521960 15897484 57641537 16081190 55981242

INCOMPLETE: 16

 

------------ CLUSTER -----------

56419757 16078687 39995521 39934703 15596301 32477553 15642991 15596894

------------ FAMILY ------------

16078687 56419757 39995521 15596301 32477553 39934703 15642991

INCOMPLETE: 7

>>>>> IN-PARALOGS -----------

<<<<< OUT-OF_CLUSTER PARALOGS -----------

15596894

 

more

 

 Program

 

Download

 

HowTos and Examples

 

 BranchClust Tutorial

 

Download Perl: www.perl.org

 

Download BioPerl: www.bioperl.org

 

 Links

 

Gogarten Lab Home Page: http://gogarten.uconn.edu/

 

Email to: Maria.Poptsova@gmail.com

 


Page last updated: May 16, 2007