[Bioclusters] Problems with a large query sequence in BLAST

Aaron Darling darling at cs.wisc.edu
Wed Mar 23 10:30:58 EST 2005


This sounds more like a job for a global genome aligner than for BLAST.

LAGAN and MAVID are great if you are certain there are no rearrangements 
in your data.
Shuffle-LAGAN will handle pairs of possibly rearranged sequences.
Mauve (my own project) will do multiple rearranged sequences and comes 
with a visualization component.

LAGAN et. al:  http://lagan.stanford.edu/lagan_web/index.shtml
Mavid:   http://baboon.math.berkeley.edu/mavid/
Mauve:   http://gel.ahabs.wisc.edu/mauve

-Aaron


Jan van Haarst wrote:

>On Thu, 17 Mar 2005 07:37:40 -0800, Ian Korf <iankorf at mac.com> wrote:
>
>  
>
>>What genome does the BAC come from? What are you trying to do exactly?
>>    
>>
>The data are from tomato and potato, and as there is no way to
>predicts genes well, we use blast to get a first rough look at the
>data.
>
>  
>
>>You didn't answer that. By the way, there's a really good book on BLAST
>>from O'Reilly & Associates that discusses these issues in great detail.
>>    
>>
>
>I know of your book, just haven't had a chance to buy & read it yet.
>
>Maybe I should explain myself better so you all can help me better.
>
>What we try to do is get a rough idea of what genes are present on an
>newly sequenced and assembled BAC. The normal way would be to use gene
>prediction software to predict the genes, and blast those genes.
>But because there aren't good models (yet) for these genomes, we need
>another way to get a quick look.
>
>When one BLASTs a large query, in our case 65K, the probability of
>hitting a well preserved  gene is large. And as those genes will give
>a lot of hits, the rest of the genes will not show up, unless you set
>the number of hits to show very high.
>But setting the number of results high makes the end-user unhappy, as
>they will have to wade through a lot of the same data to see the more
>interesting bits.
>
>What I would like is a method to limit the number of hits per region,
>so for every hit you inly see the first 10 or so. NCBI BLAST has such
>an option (-K), but as I already said, it doesn't and apperently never
>will work.
>
>I haven't been able to find a solution yet, maybe somebody can point
>me in the right direction ?
>
>  
>


More information about the Bioclusters mailing list