[Bioclusters] Anyone interested in clustering transmeta cpus?

chris dagdigian dag@sonsorol.org
Sat, 25 Aug 2001 11:46:31 -0400

At 08:56 PM 8/24/01 -0400, Ivo Grosse wrote:

>... and even 1 GB per node might often be too small to run useful BLAST

Blackstone and others have tacked this problem by breaking up the databases 
into pieces that are small enough to be dynamically shipped peer-to-peer 
style around the network and cached in local RAM. This was a huge priority 
project at our company back when DRAM prices were very very high :) (Memory 
used to account for 50% of the total cost some of the servers we bought). 
Now that memory is pretty darn cheap it is not as beneficial except in 
cases where you are forced to deal with low-memory hardware like the RLX 
blades.  The process to do this is not that difficult assuming you can get 
the statistics correct when you merge your spit result sets back together. 
This approach is not suitable for scientists doing one-off searches 
agasinst many databases...it works best when you know that you have to do 
many queries against a large db.

It does work though- we did some blazing fast searching on nodes with 256mb 
RAM using this approach.

> > o No possibility of a PCI slot; this rules out Myrinet and other high
> > speed interconnect technologies
>... is a fast communication between the nodes really important in
>bioinformatics applications, which are typically embarassingly parallel?

No for bioinformatics; yes for other life science areas.

There is no need for high speed interconnect for bioinformatics and 
sequence analysis. As you said most of those apps are embarrassingly 
parallel and most in fact are rate limited by things like RAM and disk I/O.

Once you start having researchers who want to do computational chemistry, 
molecular modeling, QSAR and virtual screening then you start to see more 
and more emphasis on parallel code. Some PVM stuff but more and more 
commercial applications are coming out as MPI-aware. It also seems that 
many of the scientific software developers in these fields are deciding to 
start with MPI and parallelism.  Vertex Pharmaceuticals is an example of 
this case; they just replaced their 128-node SGI system with a 112CPU 
Myrinet-enabled linux cluster. The system is expected to go to 300+ CPUs 
within a year and they are pretty much going to use it entirely for 
proprietary parallel code that their researchers have cooked up inhouse.