[Bioclusters] Problems with a large query sequence in BLAST

Lucas Carey lcarey at odd.bio.sunysb.edu
Thu Mar 24 12:07:04 EST 2005


What if you were to set the gap extention cost to be very low. This should have the effect of concatenating multiple hits in conserved regions that are seperated by regions of low conservation.
Alternativly, you could output a high number of results, and write a perl script to grab the top N hits from each database entry.

-Lucas

On Thursday, March 24, 2005 at 10:38 +0100, Jan van Haarst wrote:
> When one BLASTs a large query, in our case 65K, the probability of
> hitting a well preserved  gene is large. And as those genes will give
> a lot of hits, the rest of the genes will not show up, unless you set
> the number of hits to show very high.
> But setting the number of results high makes the end-user unhappy, as
> they will have to wade through a lot of the same data to see the more
> interesting bits.
> 
> What I would like is a method to limit the number of hits per region,
> so for every hit you inly see the first 10 or so. NCBI BLAST has such
> an option (-K), but as I already said, it doesn't and apperently never
> will work.


More information about the Bioclusters mailing list