[Biococoa-dev] starting BCAlignment

Alexander Griekspoor a.griekspoor at nki.nl
Thu Mar 10 16:20:47 EST 2005


On 10-mrt-05, at 22:05, Philipp Seibel wrote:

>> Now one thing more about matrices to explain John a bit more:
>> You can imagine that in the DNA world a (very simple) scoring scheme 
>> can be:
>> a match positive, e.g. +1
>> a mismatch negative, e.g. -1
>> A simple char comparison is all it takes to get the score.
>> But in the protein world there's more info as the change from 
>> aminoacid X to Y can be less or more important based on if they 
>> belong to the same chemical class or not. Based on analysis of 
>> mutations in many sequences, people have created substitution 
>> matrices with this point in mind (examples are PAM and BLOSUM). As 
>> for each score these matrices have to be accessed, for performance 
>> reasons they are usually of type int** (or char** but that's the 
>> same).
>>
> I think we should use a int* instead of int** because its faster. Take 
> a look at my BCScoringMatrix.

You're the expert! ;-)
I came along this example code which I though was quite elegant:
Generation of a (DNA)scoring matrix:

	    match = 1;
	    mismh = -1;
	    /* set match and mismatch weights */
	    for ( i = 0; i < 128 ; i++ )
	      for ( j = 0; j < 128 ; j++ )
	         if (i == j ) v[i][j] = match;
	         else v[i][j] = mismh;

	    v['N']['N'] = mismh;
        	v['n']['n'] = mismh;
         v['A']['a'] = v['a']['A'] = match;
        	v['C']['c'] = v['c']['C'] = match;
        	v['G']['g'] = v['g']['G'] = match;
         v['T']['t'] = v['t']['T'] = match;

So, you simply build a 128x128 char matrix using the fact that chars 
are ints
Next to calculate the score:

  char *a = A[++i];	// character i in sequence A
  char *b = B[++j];	// character j in sequence B
  char *c++ = (*a == *b || isdna && v[*a][*b] == MATCHSC ) ? '|' : ' ';  
// code to insert a | in the case of a match and
																		// a space in the case of a mismatch

Again, my experience is pretty limited, so I believe you immediately 
that using a simple int array is faster than a matrix, and certainly 
much simpler!!
Cheers,
Alex

*********************************************************
                     ** Alexander Griekspoor **
*********************************************************
               The Netherlands Cancer Institute
               Department of Tumorbiology (H4)
          Plesmanlaan 121, 1066 CX, Amsterdam
                   Tel:  + 31 20 - 512 2023
                   Fax:  + 31 20 - 512 2029
                   E-mail: a.griekspoor at nki.nl
	        AIM: mekentosj at mac.com
               Web: http://www.mekentosj.com

                  EnzymeX - To cut or not to cut
              http://www.mekentosj.com/enzymex

*********************************************************
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 4116 bytes
Desc: not available
URL: <http://www.bioinformatics.org/pipermail/biococoa-dev/attachments/20050310/1c3483e5/attachment.bin>


More information about the Biococoa-dev mailing list