On Tue, 2003-07-15 at 12:27, Nicholas Henke wrote: [...] > The one 'practical' situation we see here is on our Genomics cluster, > where they are running BLAST on very large data sets. It makes an > extremely large difference to copy the data to a local drive and use > that than to access the data via NFS. One thing that you can do is to segment the databases (use the -v switch on formatdb) or if you don't care about the absolute E-values being correct relative to your real database size, you could pre-segment the database using a tool such as our segment.pl at http://scalableinformatics.com/downloads/segment.pl . The large cost of disk access for the large BLAST jobs comes from the way it mmaps the indices, in case they overflow available memory. If they do overflow memory, then you spend your time in disk IO bringing the indices into memory as you walk through them. This lowers your overall absolute performance. Regardless of the segmentation, it is rarely a good idea (except in the case of very small databases) to keep them on NFS for the computation. Even if they are small, you are going to suffer network congestion very quickly for a reasonable number of compute nodes. Of course this gets into the problem of moving the databases out to the compute nodes. We are working on a neat solution to the data motion problem (specifically the database transport problem to the compute nodes). To avoid annoying everyone, please go offlist if you want to speak to us about it. Email/phone in .sig. -- Joseph Landman, Ph.D Scalable Informatics LLC email: landman@scalableinformatics.com web: http://scalableinformatics.com phone: +1 734 612 4615