[Bioclusters] a dedicated cluster to mpiblast the nr database

Tim Cutts bioclusters@bioinformatics.org
Fri, 5 Dec 2003 09:08:24 +0000


On 04-Dec-03, Mike Cariaso wrote:
> I'm looking to put together a small cluster for a very
> large number of blasts against a local copy of the the
> ncbi nr database.
> 
> I am under the impression the best thing I can do is
> to get it all in memory, for which I'm estimating 24GB
> should cover me. 

Of which only 3 GB can be used by any single BLAST job, if you do buy an
x86 machine, and you're relying on disk caching for all the rest of it.

> I've found a vendor for 3 dual 2.4G Xenon each with
> 8GB (ram: DDR 8x1G ecc pc2100). The total price is
> about $12k for the 3 of them.

Alternatively, spend the same amount on 6 dual Xeon machines with, say
4GB of RAM each.  Same money, twice as many CPUs, so even if the BLASTs
are 30% slower on the smaller machines due to less caching, you win in
terms of total throughput.  We have a blast farm which consists of
almost 800 single CPU blades.  Individually, they're pretty slow
(they're only 800 MHz Pentium III with 1 GB RAM), but the throughput
they can achieve together is superb, especially since there are 24 of
them per 3U of rack space.

> Any advice, experience, or warnings would be greatly appreciated.

If you can, you really need to *try* some example jobs on the two
different setups, and see which is more cost effective.  Get some eval
boxes and try it out.

We've done this recently, comparing 32-bit BLAST on Linux/x86, with 4GB
RAM, against various vendors' quad-CPU-64-bit-32GB-RAM performance
monsters.  For BLAST, in particular, I don't think big machines are
worth it.  The 20-40% speed advantage is much smaller than the price
penalty.  Buy lots of little ones instead.  If you need a more generic
high performance compute resource which requires much larger amounts of
memory (and there are plenty of bioinformatics tasks which do, it's just
that BLAST isn't really one of them), that's when you start considering
the big memory boxes.

Just my 2¢

Tim

-- 
Dr Tim Cutts
Informatics Systems Group
Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK