l x yi said > It can be assumed that the sequences > in the databank are independent sequences, but if we > are using the same sequence as query each > time,wouldn't the scores obtained be dependent? Actually the first assumption is bad---databases are full of repeated and nearly repeated sequences. They are far from independent draws from the usually assumed null models. There are some query dependences that have been studied---I know that the authors of BLAST have published and implemented length corrections to their calibration for short queries. Statisticians like to assume independence, because without it the math often becomes intractable. Independence is rarely really present---the question is how much error gets introduced by the independence assumption, and does the computation of E-values provide a better or worse view of the results than not computing the E-values. In the case of sequence alignments, the independence assumption is not too terrible, and calibration does improve the interpretability of results. There are known artifacts (such as composition bias and over-sensitivity to low-entropy sequences), which reflect weaknesses in the null model used. ------------------------------ Kevin Karplus karplus at soe.ucsc.edu http://www.soe.ucsc.edu/~karplus Senior member, IEEE Board of Directors, ISCB (starting Jan 2005) Professor of Biomolecular Engineering, University of California, Santa Cruz Undergraduate and Graduate Director, Bioinformatics Affiliations for identification only.