[BiO BB] RE: Some questions of alignment

Zhang, Yuanji yjzhang at noble.org
Mon Jun 21 11:47:51 EDT 2004

Dear Evgeny,

Thank you for your response. I am afraid 'alignment' was not well defined in
my original post. What I really mean is the possible distribution of
mismatches (M) along the length (L) of the aligned 2 sequences. Two
sequences are identical Except for the positions with mismatches. So when M
= 0, there is only one possible alignment, and when M = 1, there are L
alignments (the mismatch can be in each of all L positions). I think the
possible alignments is C(L,M) but not sure.

About undetected alignments by blastn. There are several cases. Case 1 is
that the mismatches are distributed in such a way that no seed alignment (7
or more nt identical) can be found. Case 2 is that the alignment score is
reduced by a row of mismatches too much so that blast will not extend the
alignment to include the mismatches. There might be other cases too. So the
number of undetected alignments is a function of word size, patterns of
mismatch distribution,and mismatch punishment and match reward scoring

