[Bioclusters] BLAST Memory Benchmarks revisited
darling at cs.wisc.edu
Wed Dec 1 17:40:32 EST 2004
Juan Carlos Perin wrote:
>Second, to accommodate this memory restriction, and perhaps test this on our
>own, I was considering removing a CPU as to force all the memory slots to be
>allocated to a single CPU. I am wondering if this would actually work? Or
>if the architecture is actually segregated and each 4 slots is for an
I don't know specific details of G5 architecture, but I'd be surprised
if the removing a CPU would have any effect on the behavior of memory
allocation as it pertains to BLAST searches. I'm fairly certain that
BLAST uses memory-mapped file I/O on OS X to access the blast databases,
which means that it's relying on the OS to store the database in memory
in the file system buffer-cache. Unless OS X has CPU-specific buffer
caches, which I doubt, then all installed memory gets used to cache your
blast db regardless of the number of CPUs.
Last I checked, the nt database weighed in at over 3GB. When blasting
against nt on a 4GB system the entire DB can be cached by the OS,
whereas the 2GB system relies on slower disk I/O to swap the database
into memory as it's needed.
In order to get good performance searching nt, your options seem to be
(1) put 4GB RAM in each compute node, (2) use query concatenation for
blastn, or (3) use one of the several BLAST database segmentation
packages. Last February I posted a message describing many of these:
Since then btblastall has become another option.
I help develop the mpiBLAST package. Since you are running a Mac
cluster you may be interested to know that another mpiBLAST user reports
that our most recent version 1.3.0 release candidate works well with the
Apple/Genentech BLAST optimizations on OS X.
With respect to your request for "hard evidence" that blast runs much
faster when the database fits in core memory, you may be interested in
the mpiBLAST clusterworld 2003 paper. The first figure shows the
increase in execution time and disk activity during blast searches as
the database grows larger than core memory.
More information about the Bioclusters