[Bioclusters] Condor cluster and BLAST
Christopher Dwan
cdwan at bioteam.net
Wed Jan 25 15:45:09 EST 2006
>> I am in a similar position with a new cluster in our lab. I am
>> still very early in the learning curve and your comments for Sandy
>> are very helpful. We have and all vs all Blast job that we would
>> like to run. I am trying to get a handle on how best to run this
I heartily agree with Dave Adelson's comments. My only additional
caveat is to know what you're trying to accomplish biologically
before throwing computation at the problem. Are these ESTs that
you're trying to contig? A gene set from an organism in which you're
looking for paralogs? A chunked-up whole chromosome?
BioPerl is a great set of tools for scripting just about any search
you might want to do. The real trick is in picking the tool that's
appropriate to the biological questions at hand. There are lots of
great tools out there. For chromosome scale searches, MegaBlast is a
great piece of software.
My advice in terms of BLAST speed is to get an estimate first:
Format your sequence set as a target (using formatdb) and then run a
search on a single machine against the first 10 sequences in your
dataset. Do the math and figure out how many CPU hours you're up
against. If you can get it done with an hour of work using vi,
followed by an overnight run, there's no reason to spend a week
writing a comprehensive solution.
> We are mainly interested in high homology hits
A nitpicky point: homology is evolutionary relationship. It's
descent from a common ancestor. Homology is therefore a boolean
(true or false) sort of property. A high degree of sequence
similarity is frequently an indicator of homology, although it's
neither necessary nor sufficient.
-Chris Dwan
More information about the Bioclusters
mailing list