[Biococoa-dev] BCSymbolMapping

Philipp Seibel biococoa at bioworxx.com
Sun Mar 13 16:15:43 EST 2005


Am 13.03.2005 um 21:58 schrieb Koen van der Drift:

> Philipp,
>
> Could you expand a little on what this class would do? We already have 
> BCSymbol methods to get a char for each symbol, so I am not sure what 
> additional advantages your BCSymbolMapping class would have.
>
>
> - Koen.
>
sure koen,

charles wrote ( in Thread "symbol mapping for optimization" just look 
in the list ;-) ):

I think what is really not obvious at first, and that can be confusing, 
is the separation between (1) and (2). It seems obvious that a char 
should be the char corresponding to the BCSymbol, for instance base 'A' 
should be mapped to char 'A'. Maybe we will do that initially but we 
want to be able to modify that in the future, or even to have more 
dynamic mapping depending on the context. For instance, we might find 
later that mapping the bases ATGC to teh chars '0x00-0x01-0x02-0x03' is 
much better than mapping to the 'ATGC' chars, because we don't have 
useless chars in between each used char. We then just have to modify 
the code in (2), and probably only one or two lines of code, to 
propagate whatever optimization we make in the translation to the whole 
framework.

phil ( thats me ;-) ):

So we need an optimal mapping for special SymbolSets. For example ATCG 
should map to 0, 1, 2, 3 to get the best mapping for algorithms. If we 
take the actual char method we would get the int representation of a 
special character ( e.g. A = 'A' = (int)'A' = don't know the asci 
number ;-) ), but thats not what we need.
So amino acids should be mapped to 0...22 and nucleotides should be 
mapped to 0....3.

hope you got it, feel free to ask again ;-)

Phil




More information about the Biococoa-dev mailing list