At 08:56 PM 8/24/01 -0400, Ivo Grosse wrote: >... and even 1 GB per node might often be too small to run useful BLAST >jobs. Blackstone and others have tacked this problem by breaking up the databases into pieces that are small enough to be dynamically shipped peer-to-peer style around the network and cached in local RAM. This was a huge priority project at our company back when DRAM prices were very very high :) (Memory used to account for 50% of the total cost some of the servers we bought). Now that memory is pretty darn cheap it is not as beneficial except in cases where you are forced to deal with low-memory hardware like the RLX blades. The process to do this is not that difficult assuming you can get the statistics correct when you merge your spit result sets back together. This approach is not suitable for scientists doing one-off searches agasinst many databases...it works best when you know that you have to do many queries against a large db. It does work though- we did some blazing fast searching on nodes with 256mb RAM using this approach. > > o No possibility of a PCI slot; this rules out Myrinet and other high > > speed interconnect technologies > >... is a fast communication between the nodes really important in >bioinformatics applications, which are typically embarassingly parallel? No for bioinformatics; yes for other life science areas. There is no need for high speed interconnect for bioinformatics and sequence analysis. As you said most of those apps are embarrassingly parallel and most in fact are rate limited by things like RAM and disk I/O. Once you start having researchers who want to do computational chemistry, molecular modeling, QSAR and virtual screening then you start to see more and more emphasis on parallel code. Some PVM stuff but more and more commercial applications are coming out as MPI-aware. It also seems that many of the scientific software developers in these fields are deciding to start with MPI and parallelism. Vertex Pharmaceuticals is an example of this case; they just replaced their 128-node SGI system with a 112CPU Myrinet-enabled linux cluster. The system is expected to go to 300+ CPUs within a year and they are pretty much going to use it entirely for proprietary parallel code that their researchers have cooked up inhouse. -Chris