> Hi, I'm the administrator the bioinformatics laboratory at Université > du Québec à Montréal. I have a room filled with dual P4 3GHz > workstations. The boxen are dual booted with Windows and > GNU/Linux but > they spend most of their time on GNU/Linux. Each box have 2Gb of RAM > so I expected decent performance with local BLAST jobs but the sad > truth is that my jobs are run about 4 times slower with blast2 than > with blastcl3 with the same parameters. The hard drive is IDE so I > suspect a bottle neck here. Make sure the IDE drivers are configured to use DMA I/O, but if repeat searches of a database are just as slow as the first time it is searched, then experience indicates the problem is that the amount of free memory available is insufficient to cache the database files. Database file caching is a tremendous benefit for blastn searches. If your jobs too much heap memory, though, no memory may be available for file caching. > Strangely if I set the number of threads > to 2 or 3 with -a my jobs run slower. Use of more threads requires more working (heap) memory for the search, making less memory available to cache database files. If the database files aren't cached, more threads means more terribly slow disk head seeking as the different threads request different pieces of the database. If heap memory expands beyond the physical memory available, the system will thrash. With WU-BLAST, multiple threads are used by default, but if memory is found to be limiting, the program automatically reduces the number of threads employed, to avoid thrashing. For WU-BLAST, the nucleotide sequence database files that are most important to cache are the compressed sequence file and the table file, which have extensions .xns and .xnt. > Do you think there is a way to like those machines together in order > to get better performance ? I think that I can't use something like > mpiBlast because there is always a risk that a node be rebooted > under windows making a part of the database suddenly unavailable. If the system is not in a thrashing state due to its heap memory requirements (or the heap memory requirements of other concurrent processes), segmentation of the database can permit nodes to cache their assigned portion of the database. Only the first search is slow then. Subsequent searches utilize the cached copy of the database segment. With WU-BLAST, the database can be segmented dynamically at run time for each compute node, using the dbslice option. (See http://blast.wustl.edu/blast/parameters.html#dbslice). By distributing each job across multiple nodes and assigning the same slice to each node for every job, you'll be able to take advantage of file caching. If nodes come and go from the cluster, just re-assign slices -- no need to re-format the database. More information about blast memory use is available at http://blast.wustl.edu/blast/Memory.html. --Warren