. So let me start out an answer like any good one should, with a
reference: http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html
This is from the NCBI's site, which has a *lot* of educational links about
various aspects of that side in computational biology.

. That link is I feel fairly non-technical, with as few equations as
possible, yet still retaining a good amount of specific details.  If you
want the more mathematically rigorous explanations, search for papers by
Altschul, Lipman, Gish, Waterson, Karlin, Pearson, etc. as they've done a
lot of important work in this area.

. But to quickly answer your questions on e-values: the score you get is
dependent upon the length and complexity of the query sequence, as well as
the size of the database.  It's generally not a good measure to examine
matches by only looking at the percent identity.  I believe that the past
few iterations of Blast have been using P-scores instead of E-values--
which are essentially the same when the values are significant enough.
And the p-values are interpreted as: the probability that this match could
occur by chance.

. And on selecting a return threshold: I don't remember an option on the
web server for that.  There is an e-value cutoff (in the most general
case, I'll believe scores of e-20 or greater), but again, it doesn't
necessarily flow that 100% identity == great e-value.

Frederick Tan

PS. Note that for the most accurate definitions, go to the references

On Tue, 19 Feb 2002, NewGene wrote:

>         If I do a blast search and want only hits with 100% identity 
> between query and subject, how can I specify the e-value or other 
> parameters for this purpose? I am not familiar with the underlying 

