On 04-Dec-03, Mike Cariaso wrote: > I'm looking to put together a small cluster for a very > large number of blasts against a local copy of the the > ncbi nr database. > > I am under the impression the best thing I can do is > to get it all in memory, for which I'm estimating 24GB > should cover me. Of which only 3 GB can be used by any single BLAST job, if you do buy an x86 machine, and you're relying on disk caching for all the rest of it. > I've found a vendor for 3 dual 2.4G Xenon each with > 8GB (ram: DDR 8x1G ecc pc2100). The total price is > about $12k for the 3 of them. Alternatively, spend the same amount on 6 dual Xeon machines with, say 4GB of RAM each. Same money, twice as many CPUs, so even if the BLASTs are 30% slower on the smaller machines due to less caching, you win in terms of total throughput. We have a blast farm which consists of almost 800 single CPU blades. Individually, they're pretty slow (they're only 800 MHz Pentium III with 1 GB RAM), but the throughput they can achieve together is superb, especially since there are 24 of them per 3U of rack space. > Any advice, experience, or warnings would be greatly appreciated. If you can, you really need to *try* some example jobs on the two different setups, and see which is more cost effective. Get some eval boxes and try it out. We've done this recently, comparing 32-bit BLAST on Linux/x86, with 4GB RAM, against various vendors' quad-CPU-64-bit-32GB-RAM performance monsters. For BLAST, in particular, I don't think big machines are worth it. The 20-40% speed advantage is much smaller than the price penalty. Buy lots of little ones instead. If you need a more generic high performance compute resource which requires much larger amounts of memory (and there are plenty of bioinformatics tasks which do, it's just that BLAST isn't really one of them), that's when you start considering the big memory boxes. Just my 2¢ Tim -- Dr Tim Cutts Informatics Systems Group Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1SA, UK