[Bioclusters] sensitivity & blast
Chris Dwan
cdwan at bioteam.net
Wed Apr 6 16:58:36 EDT 2005
BLAST is not a black box, and its function need not be determined by
experiment:
- An excellent reference on the algorithm:
http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html
- The source code: ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/ncbi.tar.Z
- O'Reilly published an entire book on BLAST, whose author is active on
this list.
Yes, the search space defaults to the product of the query length (m)
and the target set length (n). The -Y option overrides that search
space.
Alignment Score depends only on the alignments and the substitution
matrix.
Bit score normalizes for values specific to the substitution matrix.
Expect value normalizes out query and target set size.
Keep in mind as well: BLAST is an heuristic algorithm with no
knowledge of any structure beyond primary sequence. If increased
sensitivity is the goal, you will get much greater milage by using an
algorithm which takes structure into account, or one which utilizes
more than pairwise alignments.
However, taken very literally, your answer is correct. If the goal is
to remove query length as a factor in E value, the "-Y" option is the
way to go.
-Chris Dwan
The BioTeam
On Apr 6, 2005, at 4:39 PM, Pamela Culpepper wrote:
> orks as follows.
> In the absense of -Y, the "effective search space" is the product of
> the query sequence length
> and the total database length. It affects the calculation of the
> expection value but not the score.
> It will thus vary with the query sequence length.
> Using "-Y 12345" sets the above "effective search space" to 12345,
> constant for each query
> sequence. To make the
More information about the Bioclusters
mailing list