It is intended to be a quick and easy to use method for getting a fairly accurate multiple alignment for initial phylogenetic or other analyses.
A particular gene within a blast database sequence can be represented by multiple HSPs because of introns, insertions, frame shift errors (for TBLASTN) etc. For some purposes it is desirable for such HSPs to be combined into a single gene. On the other hand some database sequence can contain two or more completely different genes each represented by one or more HSPs in the blast output. In these cases it is desirable to keep these homologous genes separate.
This script attempts to determine if HSPs for a particular database sequence correspond to the same or to different genes, and groups HSPs into multiple gene groups if possible and represents them as separate sequences in the mutiple alignment. HSPs corresponding to a particular gene group are combined into a single sequence. If two HSPs from the same group hit the same region of the query sequence the HSP with the highest average sequence identity has priority in the final multiple alignment.
Once installed type "perldoc blast2malign" for some other documentation about the algorithm and tweaking parameters.
I'll put a form on this page when I get a chance so people can run it through their browser.
INSTALLATION: