[Bioclusters] Apple/Genentech's new version of Blast
chris dagdigian
bioclusters@bioinformatics.org
Fri, 22 Nov 2002 09:44:19 -0500
bioinfo wrote:
> Hello Everyone,
> Has anyone had a chance to take a look at Apple/Genentech's new
> version of Blast? Are the performance gains a great as they say?
We are in the middle of doing a bunch of benchmark runs on Linux/Intel
cluster nodes and Apple Xserves using both 'normal NCBI blast' and the
altivec-enhanced 'agblast' plus the new version of altivec-enhanced
HMMER that I mentioned on this list a while back (alitivec HMMER is
still not properly on our website yet so email me if anyone wants the
source code or a pre-packaged installer). The work is being done for a
client who is about to make a big cluster purchase decision and we are
trying to secure permission to make the figures public after we deliver
them.
To set the record straight, we (bioteam) did not do anything with the
AG-BLAST project -- Our involvement comes about because Bill Van Etten
from our group was the person who did the original port of the
ncbi-blast codebase so that it would work on MacOS X. He gave the
patches to NCBI which incorporated them into the codebase. Using that
code Apple and Genentech were able to add in the altivec-optimizations.
To answer some of Mark's questions
1. Yep the performance gains are real for blastn. Especially cool is the
ability to change wordsizes without getting unreasonable performance in
return. It is very nice.
2. People still need to benchmark their typical 'use cases' on the
xserve -- Some people find them perfect for what they want to do and
others don't quite see amazing results. I remember hearing from Brian
Gillman at the Whitehead that in some of his testing he found that
altivec-blastn was not all that much faster for his specific needs.
3. One thing to remember is Xserve has a physical RAM limitation of 2GB
and this may hurt people who need to search very large DBs all the time.
I don't consider this bad from a blast farming perspective as 2GB is
what I'd put in a linux cluster node anyway but it is something to keep
in mind.
Elia also mentioned that 'blastn does not work as well on 2-CPUs" and
I'd have to echo that thought from my experiences in years past at
Genetics Institute. I _always_ got better blast throughput by
constraining the blast search to run only on a single CPU. Most of the
people I know who do blast farms are doing the same thing I believe --
they constrain blast to run only on a single CPU and compensate for
throughput by loading up a dual-CPU machine with 2 searches at a time.
-Chris
--
Chris Dagdigian, <dag@sonsorol.org>
Bioteam Inc. - Independent Bio-IT & Informatics consulting
Office: 617-666-6454, Mobile: 617-877-5498, Fax: 425-699-0193
PGP KeyID: 83D4310E Yahoo IM: craffi Web: http://bioteam.net