[Biodevelopers] Blast not symmetrical?

Sun Jan 21 04:01:57 EST 2007

On 19/01/07, Michael Nuhn <nuhn at rhrk.uni-kl.de> wrote:

> The link you sent shows how the bit score (S') is derived from the raw score
> (S):
>
> S' = ( lambda * S - ln K) / ln2
>
> Where the value of lambda is only derived from the scoring matrix and K is a
> constant that I don't understand.
>
> Where does the background distribution of the amino acid (or in my case DNA)
> sequence of the query come in?

Hi Michael and everyone,

i don't know how the things work out for DNA sequences, but for proteins
the background frequency is in the raw score S. The raw score S is the
sum of all scores of all HSPs (High Scoring Pairs) of the query and a
considered sequence. The score of a HSP is the sum of all pairwise
scores of all AAs of that HSP. The pairwise scores come from a
substitution matrix like BLOSUM or PAM etc. The pairwise score Spw
between AAs i and j finally is computed by the log odds ratio of
target frequency and background frequencies

     Spw_ij =  log( Q_ij / (P(i) P(j) )) / λ

where Q_ij is the target frequency derived from the respective
substitution model (PAM, BLOSUM etc.) and P(i) and P(j) in the end are
the overall background frequencies of AA i an j.

For λ the equation

 sum_i,j P(i) P(j) exp(λSij) = 1

must hold.

The above can be found here

http://blast.wustl.edu/doc/infotheory.html

and for BLAST specifically

http://blast.wustl.edu/doc/infotheory.html#KAStats

hth

  Martin

-- 
+ gpg   : http://user.cs.tu-berlin.de/~mhe/pub/martin.gpg
+ gpg fp: 4844 71B5 B4E4 3892 69CA  6EA5 6598 61BE 0021 94A2
+ http://ni.cs.tu-berlin.de/

+ In the beginning was the WORD, and the WORD was UNSIGNED,
+ and the main(){} was without form and void