[Bio-Linux] Blasting Multiple Fasta Files

Tony Travis tony.travis at minke-informatics.co.uk
Tue May 5 12:19:05 EDT 2015


On 05/05/15 16:08, Tim Booth wrote:
> [...]
> You want to run:
> 
> blastx -db foo -infile seqs_000000_to_000999.fsa -out seqs_000000_to_000999.blastx
> ...then...
> blastx -db foo -infile seqs_001000_to_001999.fsa -out seqs_001000_to_001999.blastx
> ...then...
> blastx -db foo -infile seqs_002000_to_002999.fsa -out seqs_002000_to_002999.blastx
> ...then...
> blastx -db foo -infile seqs_003000_to_003999.fsa -out seqs_003000_to_003999.blastx
> ...etc
> [...]

Hi, Tim.

It's not good to run multiple instances of BLAST on the same machine
because each invocation of BLAST will have a copy of the same database
stored in memory. MPI-BLAST avoids this by loading different parts of
the database into each worker process.

The time-consuming part of BLAST is the initial exact word match and
both the old and new versions of BLAST allow you to specify how many
threads to run to speed this up:

  BLAST  uses "-a nn"
  BLAST+ uses "-num_threads nn"

I compared "blastall", "blastn", "blat", "pblat" and "bowtie" for
mapping microRNA and mRNA to a custom database in:

Travis, A. J., Moody, J., Helwak, A., Tollervey, D., & Kudla, G. (2013).
Hyb: A bioinformatics pipeline for the analysis of CLASH (crosslinking,
ligation and sequencing of hybrids) data. Methods (San Diego, Calif.).
http://doi.org/10.1016/j.ymeth.2013.10.015

["pblat" is a parallel/multi-threaded version of BLAT]

You will need a script like this one by Jonathan Moody to convert
"bowtie2" alignments to equivalent tabular BLAST output:

  https://github.com/gkudla/hyb/blob/master/bin/sam2blast

Bye,

  Tony.

-- 
Minke Informatics Limited, Registered in Scotland - Company No. SC419028
Registered Office: 3 Donview, Bridge of Alford, AB33 8QJ, Scotland (UK)
tel. +44(0)19755 63548                    http://minke-informatics.co.uk
mob. +44(0)7985 078324        mailto:tony.travis at minke-informatics.co.uk



More information about the Bio-linux-list mailing list