>> I am in a similar position with a new cluster in our lab. I am >> still very early in the learning curve and your comments for Sandy >> are very helpful. We have and all vs all Blast job that we would >> like to run. I am trying to get a handle on how best to run this I heartily agree with Dave Adelson's comments. My only additional caveat is to know what you're trying to accomplish biologically before throwing computation at the problem. Are these ESTs that you're trying to contig? A gene set from an organism in which you're looking for paralogs? A chunked-up whole chromosome? BioPerl is a great set of tools for scripting just about any search you might want to do. The real trick is in picking the tool that's appropriate to the biological questions at hand. There are lots of great tools out there. For chromosome scale searches, MegaBlast is a great piece of software. My advice in terms of BLAST speed is to get an estimate first: Format your sequence set as a target (using formatdb) and then run a search on a single machine against the first 10 sequences in your dataset. Do the math and figure out how many CPU hours you're up against. If you can get it done with an hour of work using vi, followed by an overnight run, there's no reason to spend a week writing a comprehensive solution. > We are mainly interested in high homology hits A nitpicky point: homology is evolutionary relationship. It's descent from a common ancestor. Homology is therefore a boolean (true or false) sort of property. A high degree of sequence similarity is frequently an indicator of homology, although it's neither necessary nor sufficient. -Chris Dwan