Jeremy Mann wrote: > Then how would you tell blastall which nodes have which *piece* of the > database? > Depends. If all your nodes have all the pieces then you just submit multiple blastall searches to your cluster, each search specifying only the database segment you want to query against. Easy. The harder part is getting the multiple responses back and merging them into something sensible. If your nodes do not have all the fragments on hand then you don't tell blastall. You tell your cluster load management system (PBS, GridEngine, LSF) etc. to run your searches on a specific machine, queue or consumable/static resource. There are lots of ways to do this -- you can manually tell GridEngine or LSF to run job X on host Y or you can make this a bit more abstract by making your cluster job scheduler aware of which nodes have which pieces. This can be done by configuring LSF or GridEngine with custom static or dynamic resource attributes. Once that is done you can tell LSF for instance to "run this blast job on any machine in my cluster that has the attribute NCBI-GENBANK-PART-1 set to 'true' " etc. etc. Back in my Blackstone Computing days we had a cool solution to this called smartcache. We basically added "data aware" scheduling capabilities to LSF or GridEngine. The end result was that the scheduler "knew" where the database pieces were and could allocate jobs accordingly to the proper machine or queue . -Chris -- Chris Dagdigian, <dag@sonsorol.org> BioTeam Inc. - Independent Bio-IT & Informatics consulting Office: 617-666-6454, Mobile: 617-877-5498, Fax: 425-699-0193 PGP KeyID: 83D4310E Yahoo IM: craffi Web: http://bioteam.net