[Biococoa-dev] BCSymbolMapping
Philipp Seibel
biococoa at bioworxx.com
Sun Mar 13 16:15:43 EST 2005
Am 13.03.2005 um 21:58 schrieb Koen van der Drift:
> Philipp,
>
> Could you expand a little on what this class would do? We already have
> BCSymbol methods to get a char for each symbol, so I am not sure what
> additional advantages your BCSymbolMapping class would have.
>
>
> - Koen.
>
sure koen,
charles wrote ( in Thread "symbol mapping for optimization" just look
in the list ;-) ):
I think what is really not obvious at first, and that can be confusing,
is the separation between (1) and (2). It seems obvious that a char
should be the char corresponding to the BCSymbol, for instance base 'A'
should be mapped to char 'A'. Maybe we will do that initially but we
want to be able to modify that in the future, or even to have more
dynamic mapping depending on the context. For instance, we might find
later that mapping the bases ATGC to teh chars '0x00-0x01-0x02-0x03' is
much better than mapping to the 'ATGC' chars, because we don't have
useless chars in between each used char. We then just have to modify
the code in (2), and probably only one or two lines of code, to
propagate whatever optimization we make in the translation to the whole
framework.
phil ( thats me ;-) ):
So we need an optimal mapping for special SymbolSets. For example ATCG
should map to 0, 1, 2, 3 to get the best mapping for algorithms. If we
take the actual char method we would get the int representation of a
special character ( e.g. A = 'A' = (int)'A' = don't know the asci
number ;-) ), but thats not what we need.
So amino acids should be mapped to 0...22 and nucleotides should be
mapped to 0....3.
hope you got it, feel free to ask again ;-)
Phil
More information about the Biococoa-dev
mailing list