[Biococoa-dev] starting BCAlignment
Philipp Seibel
biococoa at bioworxx.com
Thu Mar 10 16:33:33 EST 2005
Am 10.03.2005 um 22:20 schrieb Alexander Griekspoor:
> On 10-mrt-05, at 22:05, Philipp Seibel wrote:
>
>>> Now one thing more about matrices to explain John a bit more:
>>> You can imagine that in the DNA world a (very simple) scoring scheme
>>> can be:
>>> a match positive, e.g. +1
>>> a mismatch negative, e.g. -1
>>> A simple char comparison is all it takes to get the score.
>>> But in the protein world there's more info as the change from
>>> aminoacid X to Y can be less or more important based on if they
>>> belong to the same chemical class or not. Based on analysis of
>>> mutations in many sequences, people have created substitution
>>> matrices with this point in mind (examples are PAM and BLOSUM). As
>>> for each score these matrices have to be accessed, for performance
>>> reasons they are usually of type int** (or char** but that's the
>>> same).
>>>
>> I think we should use a int* instead of int** because its faster.
>> Take a look at my BCScoringMatrix.
>
> You're the expert! ;-)
> I came along this example code which I though was quite elegant:
> Generation of a (DNA)scoring matrix:
>
> match = 1;
> mismh = -1;
> /* set match and mismatch weights */
> for ( i = 0; i < 128 ; i++ )
> for ( j = 0; j < 128 ; j++ )
> if (i == j ) v[i][j] = match;
> else v[i][j] = mismh;
>
> v['N']['N'] = mismh;
> v['n']['n'] = mismh;
> v['A']['a'] = v['a']['A'] = match;
> v['C']['c'] = v['c']['C'] = match;
> v['G']['g'] = v['g']['G'] = match;
> v['T']['t'] = v['t']['T'] = match;
>
> So, you simply build a 128x128 char matrix using the fact that chars
> are ints
> Next to calculate the score:
>
> char *a = A[++i]; // character i in sequence A
> char *b = B[++j]; // character j in sequence B
> char *c++ = (*a == *b || isdna && v[*a][*b] == MATCHSC ) ? '|' : ' ';
> // code to insert a | in the case of a match and
> // a space in the case of a mismatch
>
I think it's a quite good approach, but we have to decide wheter we
want to "ask" the matrix with two BCSymbols or just with chars. Take a
look at my recent implementation of the scoring matrix. It's perhaps
slower than this one, but more comfortable. I think we just have to
test the performance, when we've done the first algorithm.
> Again, my experience is pretty limited, so I believe you immediately
> that using a simple int array is faster than a matrix, and certainly
> much simpler!!
My experience is limited to several java alignment implementations, so
i've never done this with a good programming language ;-)
Phil
> Cheers,
> Alex
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 4026 bytes
Desc: not available
URL: <http://www.bioinformatics.org/pipermail/biococoa-dev/attachments/20050310/128c107b/attachment.bin>
More information about the Biococoa-dev
mailing list