[Bioclusters] Problems with a large query sequence in BLAST
Aaron Darling
darling at cs.wisc.edu
Wed Mar 23 10:30:58 EST 2005
This sounds more like a job for a global genome aligner than for BLAST.
LAGAN and MAVID are great if you are certain there are no rearrangements
in your data.
Shuffle-LAGAN will handle pairs of possibly rearranged sequences.
Mauve (my own project) will do multiple rearranged sequences and comes
with a visualization component.
LAGAN et. al: http://lagan.stanford.edu/lagan_web/index.shtml
Mavid: http://baboon.math.berkeley.edu/mavid/
Mauve: http://gel.ahabs.wisc.edu/mauve
-Aaron
Jan van Haarst wrote:
>On Thu, 17 Mar 2005 07:37:40 -0800, Ian Korf <iankorf at mac.com> wrote:
>
>
>
>>What genome does the BAC come from? What are you trying to do exactly?
>>
>>
>The data are from tomato and potato, and as there is no way to
>predicts genes well, we use blast to get a first rough look at the
>data.
>
>
>
>>You didn't answer that. By the way, there's a really good book on BLAST
>>from O'Reilly & Associates that discusses these issues in great detail.
>>
>>
>
>I know of your book, just haven't had a chance to buy & read it yet.
>
>Maybe I should explain myself better so you all can help me better.
>
>What we try to do is get a rough idea of what genes are present on an
>newly sequenced and assembled BAC. The normal way would be to use gene
>prediction software to predict the genes, and blast those genes.
>But because there aren't good models (yet) for these genomes, we need
>another way to get a quick look.
>
>When one BLASTs a large query, in our case 65K, the probability of
>hitting a well preserved gene is large. And as those genes will give
>a lot of hits, the rest of the genes will not show up, unless you set
>the number of hits to show very high.
>But setting the number of results high makes the end-user unhappy, as
>they will have to wade through a lot of the same data to see the more
>interesting bits.
>
>What I would like is a method to limit the number of hits per region,
>so for every hit you inly see the first 10 or so. NCBI BLAST has such
>an option (-K), but as I already said, it doesn't and apperently never
>will work.
>
>I haven't been able to find a solution yet, maybe somebody can point
>me in the right direction ?
>
>
>
More information about the Bioclusters
mailing list