Dear Bio-cluster folks, This test (quick hack) shows how one can use multiple computers, including spare MacOSX, Windows, and Linux workstations, to distribute and speed up large biosequence analyses, BLAST in this example. If you can split large data sets to small subsets distributed to many computers, analyze each subset and reassemble subset results to a whole, you should be able trade time for compute nodes. Note the data directory at bio-mirror.net is fragile - please don't test more than a few 1000s of sequences fetched from bio-mirror.net data directory (though I've tested millions, it is still subject to occasional faults and hasn't been tested under a large load). Feel free to set up and run large tests on your own computers :) Note also that documentation on these datagrid directories is still sparse. See http://iubio.bio.indiana.edu/biogrid/ http://iubio.bio.indiana.edu/biogrid/directories/ http://iubio.bio.indiana.edu/biogrid/directories/gridlets/ For each compute node on your test grid, do this: 1. Install/test/locate NCBI BLAST software 2. Download Biogridlet .class and .prop files. Edit .prop properties to use the biosequence databank you want. 3. Find a query biosequence somewhere. 4. Use Biogridlet to copy a databank subset to each node and run blast: 1. node1: java Biogridlet start=0 count=1000 | $bl/formatdb -i stdin -p F -o T -n databank1 $bl/blastall -p blastn -d databank1 -i query -m 8 -o databank1.out 2. node2: java Biogridlet start=1000 count=1000 | $bl/formatdb -i stdin -p F -o T -n databank2 $bl/blastall -p blastn -d databank2 -i query -m 8 -o databank2.out 3. node3 .. n 5. Copy blast results from each node and assemble to full result (yet to do; see NBLAST for how :) The runtime cost for this grid example, from a few quick tests, is approximately the time it takes to run on one computer with a full databank, divided by the number of nodes and subset databanks you use. This test bypasses the sophistication of grid infrastructure like Globus, GridEngine, etc. for sake of simplicity. It eventually could work somewhere between the cases of SETI@HOME and Globus in terms of simplicity versus controls. -- Don Gilbert -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405 -- gilbertd@bio.indiana.edu