[Bioclusters] distributed blasting of genomes and WASHU blast

Joseph Landman bioclusters@bioinformatics.org
Tue, 11 Feb 2003 23:26:30 -0500


Tim:


  1)  see http://blast.wustl.edu/blast/README.html#Tofly and look for 
hspmax= (among others)

  2)  Is your database an assembled genome?  E.g.  1 sequence/chromosome 
or similar sized entity?  If so, you might look at splitting the 
database into smaller sequences by low complexity, or various length 
overlapping segments.  It depends upon what information you are trying 
to get at.  

Joe 

Tim Harsch wrote:

>Two questions here (the quick one first):
>    1)    How do you tell WASHU blast to return more than 1000 hits when
>using tblastx?
>
>    2)    If I have two large genomes that need a lengthy blast, how can I
>split that up?
>
>Just considering an SMP machine for now, perhaps SGE later..  As we know
>threading is not as effective as individual blasts.  In my case, with one
>genome as the database and one as the query, WASHU blast is never using more
>than one thread so no parallelism is achieved.  I'm thinking that I could
>take my query sequence split it into X parts and blast one part per CPU but
>then what about the boundaries between sequences as possible hits?  If I
>want to assume no before-hand knowledge of the genome here, I'm thinking I
>could process the results from the X parts, find the stop base of the last
>hit on the X-1 part, call it A, and the start base of the first hit of the X
>part, call it B, and create a subsequence from A to B from the original
>database sequence, repeat for all boundarties of the X parts, then blast
>these new subsequences against the database then union the hits from this
>with hits from the X parts.
>
>If I'm correct, using this method my e-values would even be the same than if
>I had done a simple one-on-one comparison, because my database never
>changes.
>
>Does this sound reasonable?  Even so, if there is an easier method then I
>sure would like to hear it.
>
>Ciao,
>
>Tim Harsch
>Computer Scientist
>Lawrence Livermore National Laboratory
>
>_______________________________________________
>Bioclusters maillist  -  Bioclusters@bioinformatics.org
>https://bioinformatics.org/mailman/listinfo/bioclusters
>  
>

-- 
Joseph Landman, Ph.D
Scalable Informatics LLC,
email: landman@scalableinformatics.com
web  : http://scalableinformatics.com
phone: +1 734 612 4615