Try BLAT. >From: "L. Mui" <lmui at stanford.edu> >Reply-To: "Clustering, compute farming & distributed computing in life >science informatics" <bioclusters at bioinformatics.org> >To: Chris Dwan <cdwan at bioteam.net>, pculpep at hotmail.com >CC: "Clustering, compute farming & distributed computing in life science >informatics" <bioclusters at bioinformatics.org> >Subject: Re: [Bioclusters] sensitivity & blast >Date: Thu, 7 Apr 2005 00:35:33 -0700 > >Chris and Pam, > >Thanks for your insights in the emails. > >About what we are trying to do: we are trying to select 70mer DNA oligos >for >microarrays. We try to select the "best" oligo set which (1) minimizes >cross-hybridization with non-self seq in genome while (2) maximizing target >binding. > >The troubling point which led to my earlier question is: > >(1) from results based on feeding query sequences of varying length to >blastall, we select 70mers based on the 2 goals above > >(2) when we feed the 70mers into blastall again, we get different HSP's >when >the e-value is fixed at the default 10. > > >From your feedbacks, to remove the dependence on the input size, setting >the >"-Y" value seems to be a sensible approach. Won't this restriction of >search space reduce the prob of finding the best HSPs? > >Also: because we know the expect E value depends on (kmn)(exp(-Ls)), why >not >find a base E for a given query length, and then vary the (-e) value by mE >? > >Chris, you mentioned that there are other tools we should look at. Please >advice on this. > > Lik > > >Quoting Chris Dwan <cdwan at bioteam.net>: > > > Could you suggest whether we are on the right track? What is the >right > > > approach to set a uniform sensitivity for all inputs? > > > > E-values already incorporate statistics to eliminate (normalize for) a > > number of factors, including query size. Getting rid of that > > normalization is possible, but not necessarily a good idea unless you > > know exactly what you're doing. > > > > E values for identical HSPs grow with the product of the sizes of the > > query and the target set. The rationale is that the same hit will be > > more and more likely to occur by random chance in a larger sample of > > sequence. Said HSPs will be less and less statistically interesting as > > the query and the target set grow. > > > > This leads to your observation that you must increase the E-value > > threshold to keep getting the same hits. > > > > The question you seem to be asking is "find me all of the HSPs that fit > > some criterion, regardless of their statistical significance." The > > question that BLAST is designed to answer is "find me most of the > > statistically significant HSPs for some particular search, and extend > > them to build up gapped local alignments." > > > > If you're willing to share your goal in running these searches, the > > list might be able to suggest alternative tools better suited to your > > problem. > > > > -Chris Dwan > > The BioTeam > > > > > > >_______________________________________________ >Bioclusters maillist - Bioclusters at bioinformatics.org >https://bioinformatics.org/mailman/listinfo/bioclusters