[Bioclusters] Questions on mpiBLAST

Thu Feb 3 14:05:32 EST 2005

You rang .... :)

Brodie, Kent wrote:
> Q: can someone point me to the results obtained by Joe Landman?  (web
> site, or..?)
> 
> Many thanks,  -- Kent C. Brodie, Medical College of Wisconsin
> 
> 
> 
> 
>>-----Original Message-----
>>From: bioclusters-bounces+brodie=mcw.edu at bioinformatics.org
>>[mailto:bioclusters-bounces+brodie=mcw.edu at bioinformatics.org] On
> 
> Behalf
> 
>>Of Chris Dagdigian
>>Sent: Thursday, February 03, 2005 12:28 PM
>>To: Hrishikesh Deshmukh; Clustering, compute farming & distributed
>>computing in life science informatics
>>Subject: Re: [Bioclusters] Questions on mpiBLAST
>>
>>
>>"parallelizing" blast across cluster nodes only results in significant
>>speed gains if you are trying to solve a large problem set or have a
>>massive target database that in no way shape or form can squeeze into
>>physical memory on one node.
>>
>>The performance of BLAST is rate-limited first by how much RAM you
> 
> have
> 
>>and then by how fast your disk I/O system is.
>>
>>I think Joe Landman has also seen incredible variations in blast
>>performance by experimenting with non-GNU architecture optimized
>>compilers like those from IBM, Intel and the Portland Group.
>>
>>16 machines with 2Gb of RAM reading database files off of ethernet
> 
> based
> 
>>NFS is a "normal" compute farm config.
>>
>>Outside of mpiblast you could be seeing performance lags caused by
> 
> your
> 
>>network (if you are reading/writing via NFS or AFP) or by physical
> 
> memory.
> 
>>I'm not an expert on mpiblast but hope to start soon a personal
> 
> project
> 
>>to integrate it with grid engine mostly to satisfy my own curiosity.
>>
>>I agree with what Hrishikesh about your times -- you are searching
> 
> with
> 
>>a very small query set and you did not mention your target database.
>>
>>You may see better performance using one machine -- the first query
> 
> will
> 
>>be slow but the other queries will come back faster since most or part
>>of the target database will still be mmapped or whatever in RAM.
>>
>>If you really want to test mpiblast out you need to pick a much larger
>>query and target DB set.
>>
>>-Chris
>>
>>
>>
>>
>>Hrishikesh Deshmukh wrote:
>>
>>
>>>Hi,
>>>I am no authority on BLAST, i guess you see a linear speedup
> 
> increase
> 
>>>only when the problem is huge, for 20 odd sequences mpiblast doesn't
>>>play, your ncbi blast is good enough! Just curious are the results
> 
> for
> 
>>>ncbi and mpiblast for the same dataset (input) match exactly?!
>>>
>>>I am tryting to get BLAST and mpiBLAST running on Sun Grid, right
> 
> now
> 
>>>BLAST works in serial mode and mpiBLAST is kinds stuck!
>>>
>>>Cheers,
>>>Hrishi
>>>
>>>
>>>On Thu, 03 Feb 2005 11:45:45 -0500, Xiaowu Gai
> 
> <xgai at genome.chop.edu>
> 
>>wrote:
>>
>>>>Hi Everyone:
>>>>
>>>>We have a 16-node Xserve cluster, with 2GB memory on each node and
> 
> dual
> 
>>>>processors.  I was able to install mpiBLAST on it, along with
> 
> LAM/MPI.
> 
>>>>However, the performance that I saw with some test runs has not been
>>
>>that
>>
>>>>good and quite confusing.  Here is what I did:
>>>>
>>>>1.) I formatted the nt database:
>>>>
>>>>mpiformatdb -N 16 -i nt
>>>>
>>>>2.) I ran the mpiblast on one, two, five, ten, twenty, and more
>>
>>sequences
>>
>>>>(about 500bp each) and with the command:
>>>>
>>>>time mpirun N mpiblast -p blastn -d nt -i single.fa -o
> 
> blast_results.
> 
>>>>Here are the numbers:
>>>>
>>>>Single: 1m39.054s
>>>>Two: 0m11.009s
>>>>Five: 0m16.021s
>>>>Ten: 0m46.591s
>>>>twenty: 3m7.541s
>>>>..
>>>>
>>>>I am all confused.  First of all, the performance is not that
>>
>>impressive.
>>
>>>>Secondly, the numbers are very confusing to me.  Why is that a
> 
> single
> 
>>>>sequence query takes so much more time than a two (BTW, I reran the
>>
>>query of
>>
>>>>a single sequence right after the query of two and got similar
> 
> results)?
> 
>>And
>>
>>>>query of five takes only 5 seconds more than the query of two and
> 
> so
> 
>>on..
>>
>>>>I am afraid that I have done something wrong and would really
> 
> appreciate
> 
>>any
>>
>>>>thoughts.
>>>>
>>>>Thanks
>>>>
>>>>Xiaowu
>>>>
>>>>_______________________________________________
>>>>Bioclusters maillist  -  Bioclusters at bioinformatics.org
>>>>https://bioinformatics.org/mailman/listinfo/bioclusters
>>>>
>>>
>>>_______________________________________________
>>>Bioclusters maillist  -  Bioclusters at bioinformatics.org
>>>https://bioinformatics.org/mailman/listinfo/bioclusters
>>
>>--
>>Chris Dagdigian, <dag at sonsorol.org>
>>BioTeam  - Independent life science IT & informatics consulting
>>Office: 617-665-6088, Mobile: 617-877-5498, Fax: 425-699-0193
>>PGP KeyID: 83D4310E iChat/AIM: bioteamdag  Web: http://bioteam.net
>>_______________________________________________
>>Bioclusters maillist  -  Bioclusters at bioinformatics.org
>>https://bioinformatics.org/mailman/listinfo/bioclusters
> 
> _______________________________________________
> Bioclusters maillist  -  Bioclusters at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bioclusters

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615