blast2malign

blast2malign is a perl script that creates a multiple alignment (relative to the query) from the database hits of a BLAST search. Insertions in the database sequences are not represented in the output.

It is intended to be a quick and easy to use method for getting a fairly accurate multiple alignment for initial phylogenetic or other analyses.

A particular gene within a blast database sequence can be represented by multiple HSPs because of introns, insertions, frame shift errors (for TBLASTN) etc. For some purposes it is desirable for such HSPs to be combined into a single gene. On the other hand some database sequence can contain two or more completely different genes each represented by one or more HSPs in the blast output. In these cases it is desirable to keep these homologous genes separate.

This script attempts to determine if HSPs for a particular database sequence correspond to the same or to different genes, and groups HSPs into multiple gene groups if possible and represents them as separate sequences in the mutiple alignment. HSPs corresponding to a particular gene group are combined into a single sequence. If two HSPs from the same group hit the same region of the query sequence the HSP with the highest average sequence identity has priority in the final multiple alignment.

Once installed type "perldoc blast2malign" for some other documentation about the algorithm and tweaking parameters.

I'll put a form on this page when I get a chance so people can run it through their browser.

INSTALLATION:

  1. download the latest tar ball => here
  2. tar zxvf blast2malign.VERSION.tar.gz
  3. cd blast2malign
  4. ./blast2malign blast-output-file expectation-cutoff > malign.fas
  5. you can copy blast2malign to somewhere in your path (eg. /usr/local/bin) if you'd like.

Danny Rice
Last modified: Wed May 17 15:57:40 EDT 2006