[Biodevelopers] Blast not symmetrical?
Martin Heusel
mheusel at gmail.com
Sun Jan 21 04:01:57 EST 2007
On 19/01/07, Michael Nuhn <nuhn at rhrk.uni-kl.de> wrote:
> The link you sent shows how the bit score (S') is derived from the raw score
> (S):
>
> S' = ( lambda * S - ln K) / ln2
>
> Where the value of lambda is only derived from the scoring matrix and K is a
> constant that I don't understand.
>
> Where does the background distribution of the amino acid (or in my case DNA)
> sequence of the query come in?
Hi Michael and everyone,
i don't know how the things work out for DNA sequences, but for proteins
the background frequency is in the raw score S. The raw score S is the
sum of all scores of all HSPs (High Scoring Pairs) of the query and a
considered sequence. The score of a HSP is the sum of all pairwise
scores of all AAs of that HSP. The pairwise scores come from a
substitution matrix like BLOSUM or PAM etc. The pairwise score Spw
between AAs i and j finally is computed by the log odds ratio of
target frequency and background frequencies
Spw_ij = log( Q_ij / (P(i) P(j) )) / λ
where Q_ij is the target frequency derived from the respective
substitution model (PAM, BLOSUM etc.) and P(i) and P(j) in the end are
the overall background frequencies of AA i an j.
For λ the equation
sum_i,j P(i) P(j) exp(λSij) = 1
must hold.
The above can be found here
http://blast.wustl.edu/doc/infotheory.html
and for BLAST specifically
http://blast.wustl.edu/doc/infotheory.html#KAStats
hth
Martin
--
+ gpg : http://user.cs.tu-berlin.de/~mhe/pub/martin.gpg
+ gpg fp: 4844 71B5 B4E4 3892 69CA 6EA5 6598 61BE 0021 94A2
+ http://ni.cs.tu-berlin.de/
+ In the beginning was the WORD, and the WORD was UNSIGNED,
+ and the main(){} was without form and void
More information about the Biodevelopers
mailing list