[Biococoa-dev] starting BCAlignment

Philipp Seibel biococoa at bioworxx.com
Thu Mar 10 16:33:33 EST 2005


Am 10.03.2005 um 22:20 schrieb Alexander Griekspoor:

> On 10-mrt-05, at 22:05, Philipp Seibel wrote:
>
>>> Now one thing more about matrices to explain John a bit more:
>>> You can imagine that in the DNA world a (very simple) scoring scheme 
>>> can be:
>>> a match positive, e.g. +1
>>> a mismatch negative, e.g. -1
>>> A simple char comparison is all it takes to get the score.
>>> But in the protein world there's more info as the change from 
>>> aminoacid X to Y can be less or more important based on if they 
>>> belong to the same chemical class or not. Based on analysis of 
>>> mutations in many sequences, people have created substitution 
>>> matrices with this point in mind (examples are PAM and BLOSUM). As 
>>> for each score these matrices have to be accessed, for performance 
>>> reasons they are usually of type int** (or char** but that's the 
>>> same).
>>>
>> I think we should use a int* instead of int** because its faster. 
>> Take a look at my BCScoringMatrix.
>
> You're the expert! ;-)
> I came along this example code which I though was quite elegant:
> Generation of a (DNA)scoring matrix:
>
> 	    match = 1;
> 	    mismh = -1;
> 	    /* set match and mismatch weights */
> 	    for ( i = 0; i < 128 ; i++ )
> 	      for ( j = 0; j < 128 ; j++ )
> 	         if (i == j ) v[i][j] = match;
> 	         else v[i][j] = mismh;
>
> 	    v['N']['N'] = mismh;
>        	v['n']['n'] = mismh;
>         v['A']['a'] = v['a']['A'] = match;
>        	v['C']['c'] = v['c']['C'] = match;
>        	v['G']['g'] = v['g']['G'] = match;
>         v['T']['t'] = v['t']['T'] = match;
>
> So, you simply build a 128x128 char matrix using the fact that chars 
> are ints
> Next to calculate the score:
>
>  char *a = A[++i];	// character i in sequence A
>  char *b = B[++j];	// character j in sequence B
>  char *c++ = (*a == *b || isdna && v[*a][*b] == MATCHSC ) ? '|' : ' '; 
>  // code to insert a | in the case of a match and
> 																		// a space in the case of a mismatch
>

I think it's a quite good approach, but we have to decide wheter we 
want to "ask" the matrix with two BCSymbols or just with chars. Take a 
look at my recent implementation of the scoring matrix. It's perhaps 
slower than this one, but more comfortable. I think we just have to 
test the performance, when we've done the first algorithm.

> Again, my experience is pretty limited, so I believe you immediately 
> that using a simple int array is faster than a matrix, and certainly 
> much simpler!!

My experience is limited to several java alignment implementations, so 
i've never done this with a good programming language ;-)

Phil

> Cheers,
> Alex
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 4026 bytes
Desc: not available
URL: <http://www.bioinformatics.org/pipermail/biococoa-dev/attachments/20050310/128c107b/attachment.bin>


More information about the Biococoa-dev mailing list