[Biococoa-dev] BCSymbolMapping

Alexander Griekspoor a.griekspoor at nki.nl
Sun Mar 13 16:41:22 EST 2005


Hmmm, somehow I totally miss the reason the remapping. Why would it be 
leaner/faster?
What's the difference between:
char c = ('a' == 'a') ? 'I' : 'X';
and:
char c = ('0x00' == '0x00') ? 'I' : 'X';
So in the example I lend from the sample code I used previously 
already, the substitution matrix is a simple 128x128 char array and the 
characters are placed at their own spot.

> 	    match = 1;
> 	    mismh = -1;
> 	    /* set match and mismatch weights */
> 	    for ( i = 0; i < 128 ; i++ )
> 	      for ( j = 0; j < 128 ; j++ )
> 	         if (i == j ) v[i][j] = match;
> 	         else v[i][j] = mismh;
>
> 	    v['N']['N'] = mismh;
>        	v['n']['n'] = mismh;
>         v['A']['a'] = v['a']['A'] = match;
>        	v['C']['c'] = v['c']['C'] = match;
>        	v['G']['g'] = v['g']['G'] = match;
>         v['T']['t'] = v['t']['T'] = match;
>
> So, you simply build a 128x128 char matrix using the fact that chars 
> are ints
> Next to calculate the score:
>
>  char *a = A[++i];	// character i in sequence A
>  char *b = B[++j];	// character j in sequence B
>  char *c++ = (*a == *b || isdna && v[*a][*b] == MATCHSC ) ? '|' : ' ';

So again, if we convert the sequences to char arrays why the remap? In 
the sample code above this 128x128 matrix is instantiated only once, 
takes up hardly any memory and prevents the time needed for the remap! 
So why the hassle for the few unused spots in the matrix? It it really 
worth all the trouble going from a 128x128 array (we're talking about 
16Kb of RAM!) to a 16x16 array or so?
I understand the conversion from BCSequence to char-array, but that can 
still be done with the normal chars right? Or is the idea that when we 
do the conversion we can do the remap along? I'm just worried that the 
code won't be easier to understand and much more error prone if we're 
have to remap everything all the time.
And Koen has a point, can we just add the method charRepresentation in 
BCSequence for instance, which does the translation job (and 
sequenceFromCharArray) or something. No need for a translation object 
right?
Again, perhaps I'm taking to many steps in the wrong direction at 
once...
Alex




On 13-mrt-05, at 22:15, Philipp Seibel wrote:

>
> Am 13.03.2005 um 21:58 schrieb Koen van der Drift:
>
>> Philipp,
>>
>> Could you expand a little on what this class would do? We already 
>> have BCSymbol methods to get a char for each symbol, so I am not sure 
>> what additional advantages your BCSymbolMapping class would have.
>>
>>
>> - Koen.
>>
> sure koen,
>
> charles wrote ( in Thread "symbol mapping for optimization" just look 
> in the list ;-) ):
>
> I think what is really not obvious at first, and that can be 
> confusing, is the separation between (1) and (2). It seems obvious 
> that a char should be the char corresponding to the BCSymbol, for 
> instance base 'A' should be mapped to char 'A'. Maybe we will do that 
> initially but we want to be able to modify that in the future, or even 
> to have more dynamic mapping depending on the context. For instance, 
> we might find later that mapping the bases ATGC to teh chars 
> '0x00-0x01-0x02-0x03' is much better than mapping to the 'ATGC' chars, 
> because we don't have useless chars in between each used char. We then 
> just have to modify the code in (2), and probably only one or two 
> lines of code, to propagate whatever optimization we make in the 
> translation to the whole framework.
>
> phil ( thats me ;-) ):
>
> So we need an optimal mapping for special SymbolSets. For example ATCG 
> should map to 0, 1, 2, 3 to get the best mapping for algorithms. If we 
> take the actual char method we would get the int representation of a 
> special character ( e.g. A = 'A' = (int)'A' = don't know the asci 
> number ;-) ), but thats not what we need.
> So amino acids should be mapped to 0...22 and nucleotides should be 
> mapped to 0....3.
>
> hope you got it, feel free to ask again ;-)
>
> Phil
>
> _______________________________________________
> Biococoa-dev mailing list
> Biococoa-dev at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/biococoa-dev
>
>
*********************************************************
                      ** Alexander Griekspoor **
*********************************************************
                The Netherlands Cancer Institute
                Department of Tumorbiology (H4)
          Plesmanlaan 121, 1066 CX, Amsterdam
                   Tel:  + 31 20 - 512 2023
                   Fax:  + 31 20 - 512 2029
                  AIM: mekentosj at mac.com
                   E-mail: a.griekspoor at nki.nl
               Web: http://www.mekentosj.com

    The requirements said: Windows 2000 or better.
    So I got a Macintosh.

*********************************************************
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 6124 bytes
Desc: not available
URL: <http://www.bioinformatics.org/pipermail/biococoa-dev/attachments/20050313/c1788741/attachment.bin>


More information about the Biococoa-dev mailing list