It has come unto this list, questions about mpiblast and its copying of database segments to individual nodes. Here, we have a 20 node dual CPU beowulf. Previously, I was using mpiformatdb -N 20. Then it hit me, even though I start lamboot with cpus=2 for each node, in reality, blast only runs one instance on each node. Now I get one segment per node. But what happens when I start another job? Maybe mpiblast doesn't use the same CPU for this next job, so it copies another segment, and so forth... So this morning, I did mpiformatdb -N 44 (42 cpus + 1 dual master as Jason Gans suggested) for protein nr. Then I ran the tests I used previously, its a simple protein nr sequence which takes about 3 mins on one node. Started top on various nodes, and ls in the local storage. The very first time I ran it, of course, the segments (this time 2 per node) copied to local storage. Speed wasn't great (nfsd was the problem), but obviously faster that one node, total time for the first run was 1 min, 23 secs. With the same top and ls windows open, I ran it again. This time, no copying and each node still had its segment. Total time for the second run was 5.9 seconds! Then I thought, "Ok, maybe it caches the sequence so it didn't need to copy segments anymore." So I took a different nr protein sequence and ran this in mpiblast. To my surprise, there was no copying, and each node still had its segments. Total time for the new sequence was 5.9 seconds. I can only assume after this little experiment that formatting a db to the exact number of nodes/cpus is the key. Now, on to my next problem.... Keeping the segments in local storage so another user doesn't have to go thru the copy process. I'll keep you all posted. -- Jeremy Mann jeremy@biochem.uthscsa.edu University of Texas Health Science Center Bioinformatics Core Facility http://www.bioinformatics.uthscsa.edu Phone: (210) 567-2672